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ABSTRACT 


Consider a machine with a cellular mernory used to store a list x where X 1s 
a fimte alphabet and i © iN. We investigate the machine representation of such a 
list and the implernentation of common list operations such as determine the a 
element and adding or deleting an element. Information-theoretic arguments are 
used inorder to obtain lower bounds on storage and access costs for implementing 
vatiable-length lists and, im particular, stacks. Representations ate discussed which 
attain these bounds separately and can sometimes attain both, although itis shown 
that some common representations for stacks cannot simultaneously achieve both. 
On the constructive side, we show that i is possible to implement a stack of any 
finiie length so as to achieve Kraft storage and so that the number of memory cell 
accesses required to perform a PUSH or a TOP aperation is always O(log nm) but 
where, assuming a nonincreasing probability distribution on stack lengths, a POP 
operation requires on the average only a constant number of accesses. 
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CHAPTER 1 
INTRODUCTION 


With the present-day widespread use of computers, it is important to be able 
to efficiently store information and execute operations. For a given problem, 
depending on the structural relationships between the data elements, we choose to 
use a particular type of data structure. In this thesis, we shall consider only the 
simplest information structure, a list; in particular, we discuss stacks and briefly 


mention some work with queues. 
1.1 The Data Model 


The data model | will use for studying list structures is based on the model of 
a storage and retrieval problem developed by Elias [5] and Welch [23]. A retrieval 
problem consists of a collection of data bases, any one of which may be observed at 
a given time, and a set of retrieval questions which may be asked of any data base. 
It may also be desired to perform updates; i.e, to transform the currently observed 
data base into some other data base from the domain, the set of possible data bases. 
A retrieval system which solves a retrieval problem must have several 
components: 
(1) a method of representing any observed data base, 
(2) a method for answering any retrieval question about the observed data 
base, 
(3) a method for performing updates on the observed data base. 
For a given question, the method for answering the question must be independent 
of the observed data base; to allow the method to depend on the observed data 
base would presuppose some knowledge of the observed data base by the user in 
order to determine which method is appropriate. Thus, the method must give the 


correct answer no matter what the current data base is. 


ate 


The following example illustrates what we mean by a storage and retrieval 
problem. We delay discussion of how a data base might be stored and how a query 
or update might be implemented until the next section, where we shall reconsider 


this example. 


Example 1.1. Consider the problem of Rotary Fan Manufacturing Co., R.F.M., 
receiving mail orders for fans. Somehow R.F.M. must keep track of these orders to 
be filled. Exactly what information is needed depends on the questions and updates 
that will be executed. A data base corresponds to the current list of orders to be 
filled. The domain is the set of all possible data bases; i.e., all possible lists of fan 
orders. Notice that there are data bases of different sizes; in fact, it may be 
possible for a data base to have any integral size greater than or equal to zero. Of 
course, if R.F.M. wants to stay in business for long it had better be the case that 
shorter data bases are more probable than larger ones. 

Because old orders are continually being filled and new orders received, it 
must be possible to update the current order list; in particular, R.F.M. needs to be 
able to perform the following two updates. 

u,: Process an order from the order list. This involves mailing the desired 
fans and deleting the order from the current list. Thus, the size of the 
data base is decremented by one. 

uy? A new order arriving must be placed on the order list, which results in 
the size of the current data base being incremented by one. 

R.F.M. must also be prepared to answer queries concerning the current data 
base, such as whether or not John Doe's order is on the list, or whose order will be 
filled next. For instance, we might have the following set of questions. 

q,: Is (name) a customer waiting to have his order processed? 

q,: Who is the KTR customer in line; i.e., what order will be the Kk" to be 
served? 


q3: What are the i most recently placed orders? 


-U- 


Exactly what information needs to be stored on the order list depends on the 


particular queries that will be made. i 
12 List Structures 


We consider a list problem to be a type of storage and retrieval problem, 
where each data base is a particular list. In general the size of the list may vary, 
and exactly how the list will be implemented depends on the specific questions and 
updates to be performed. In this section we introduce the basic list structures we 
will be concerned with in this thesis: stacks, queues, and dequeues. The 
appropriate operations will be formally defined later. 

A linear dist is just an ordered sequence of items chosen from a particular set 
of elements (see eg. Knuth [14], Aho, Hopcroft, and Ullman C1]). In many 
instances, accessing of a list is restricted to the first and last elements; in particular, 
it may be the case that items can be added or deleted only at the ends of the list. 
Because these lists are frequently encountered, they have special names: stacks, 
queues, dequeues. 

A stack, also known as a push-down store or a LIFO (last-ir/ first-out) list, 
is a linear list for which all insertions and deletions are made at one end of the list, 
the fop. For example, consider an initially empty stack; iie., there are no elements 
in the list. Suppose we then insert two elements onto the stack: 

Element 1, Element 2. 
Since Element 1 was the first item put onto the stack, it occupies the bottom stack 
position and is the least accessible item; it cannot be removed until] all other 
elements on the stack have been removed. To add, PUSH, a third element onto the 
Stack, we locate the top of the stack and insert this new element, Element 3: 
Elemerit 1, Element 2, Element 3. 

Element 3 is now at the tap of the stack, and so if we delete, POP, an element 
from the stack, we are left with: 


Element 1, Element 2. 


-9- 


Of course, if the stack had been empty we would not have been able to perform a 
POP operation, so there must be some way of detecting an empty stack. 

Exactly how one might choose to implement a stack is one of the issues 
discussed in this thesis. Figure 1.1 should help picture how the stack operations 
work and corresponds to one common type of implementation, where each item in 
the stack has a pointer which indicates the location of the previous stack item. An 
additional pointer always points to the top of the stack. Such a storage arrangement 
allows the stack operations to be performed in a straightforward way. In particular, 
a TOP operation is performed by reading the pointer in order to locate the top of 
the stack and then simply reading what the TOP value is. To perform a POP we 
locate the top of the stack, use this element to locate the second stack element, and 
then reset the top of stack pointer to this second element, which becomes the TOP 
element. Similarly, a PUSH operation can be implemented by first locating some 
free memory cell, into which the appropriate new stack value is inserted, This new 
cell has a pointer which is set to the same location as the top of stack pointer, and 
then the top af stack pointer is changed so that it points to the newly filled cell, our 
new top of stack. The pointers involved in these implementations are indicated in 
Figure 1.1. Notice that the directions of the pointers between the stack elements 
make reading "down" the stack straightforward, but there would be rio way to read 
back "up" the stack. Of course, if the stack occupied a contiguous section of 


memory, there would be no need at all for pointers between the stack elements. 


PUSH or POP 


bottom 


Figure Lt. Stack Operations 
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A queue, also known as a FIFO (first-ir/ first-out) lust, or a circular hist, is a 
linear list for which all insertions are made at one end of the list, the rear, and all 
deletions are made at the other, the front. ‘Thus, elements leave the list in the same 
order in which they entered. Suppose we insert, ENQUEUE, three elements onto an 
initially empty queue, first element 1, then element 2, then element 3: 

Element 3, Element 2, Element 1. 
If we now delete, DEQUEUE, one element, we are left with: 
Element 3, Element 2. 
Figure 1.2 illustrates the queue operations. Notice that if the arrows between 
elements in Figure 1.2 were reversed, then after performing a DEQUEUE operation 
we would have no way to keep track of the location of the front of the queue. Of 
course, we might choose to store pointers going in both directions, but this would 


involve greater storage costs. 


ENQUEUE DEQUEUE 


front 


Figure 1.2. Queue Operations 


A dequeue is a linear list for which all insertions and deletions are made at the 
ends of the list. hus, a stack and a queue can each be viewed as a particular type 
of dequeue. One may also distinguish output-restricted or input-restricted dequeues, 
in which deletions or insertions, respectively, are allowed to take place at only one 
end. ‘The ends are commonly referred to as left and right, although either an 
insertion or a deletion may occur at either end (see Figure 1.3). We shall not in 


this report discuss any results specifically concerning dequeues, but it appears that a 


-li- 


dequeue can be viewed as a Straightforward extension of a queue. 


Insert or Delete Insert or Delete 


2nd 
from 
right 


Figure 1.3. Dequeue Operations 


Now that we have discussed these simple list structures, let us reconsider the 


issue of developing a solution to the system of Example L.1. 


Example 1.2. How R.F.M. develops a system to solve its order problem depends not 
only on finding at efficient means to store any data base, but also on what queries 
and updates it expects to be making most often. Thus, finding an “optimal” solution 
would depend on knowing some rather precise probabilities. On the other hand, 
we can at least make some general comments. ‘he representation of a data base 
must include the names of the persons who ordered fans, as well as the other 
liecessary information such as quantity ordered, address, payment, etc. It would 
probably make sense to store a data base as some sort of list structure. For 
sunplicity, let us consider only a list of names and assurne that each marne also 
comtains a pointer to the relevant corresponding information. In other words, we 
access any element in the list by reading the appropriate name. We have decided 
that each data base is to be represented by a list structure, but the type would be 
determined by R.F.M.'s desired processing order. Let us discuss several possible 
implementations. 

One reasonable scheme would be to process orders FIFO; 1e., in the same 
order in which they arrived. ‘his would correspond to implementing some sort of 


queue, perhaps as in Figure 1.2. In this case we always keep track of the next 


-]2- 


order to be processed and the last order received. Presumably, updates u, and uz 
would be easy to perform. On the other hand, returning the answer to question q, 
requires searching the queue for a particular name. Unless we have more 
information, this could require searching through the entire list. For a queue 
implementation, it would probably be straightforward to answer question qz, by 
tracing backward k items from the front. On the other hand, q, would probably 
be difficult to answer. ‘To determine the one most recently placed order would 
require only a single access to the rear of the queue. But to determine the second 
most recently placed order is not as easy. Unless there is some way of knowing the 
“reverse pointers", then it would be necessary to read all items from the front, 
keeping track of each previous item read, until we reach the rear of the queue. Of 
course, if we expected qz to be asked frequently, we might wish to alter our 
implementation scheme and store both forward and reverse pointers. At the price 
of increased storage, we could decrease the expense of answering q 4. 

Another possible scheme would be to try to process orders as they are 
received, using a stack representation. Of course, R.F.M. Co. might lose a lot of 
business this way, because if it gets at all behind in processing orders, then some 
poor souls would be stuck indefinitely at the bottom of the stack. (And R.F.M. 
hasn't even considered the issue of cancelling an order from the middle of the list!) 
With such a FILO implementation, we would expect q, to be easier to answer than 
it was with a queue implementation, but now q, doesn't even make sense, because 
there is mo way to know when an order will be processed. Question q, would 
probably be no more or less difficult than it was for the queue. 

If we expected to spend most of our time answering question Gy. we might 
want to sort the list of names alphabetically. (This would also make it easier to 
cancel an order.) But then we would need some additional means of indicating the 
processing order, such as a number field associated with the name. Unless we want 
to mail out the fans according to some alphabetical order, we would either need 


pointers to indicate the processing order or else updates might be very expensive. J 


= 13 a 
1.3 Computer-Implemented List Problems 


In this thesis we are concerned with computer-implemented solutions of list 
problems. Recall that in Section 1.1 we meritioned the three components that any 
such system must possess. Note that requiring the algorithmic method for answering 
a question (or performing an update) be independent of the observed data base 
implies a strict separation of "program" and "data". The "program" to answer a 
question must remain constant, while presumably the computer memory state 
(representing the observed "data") differs for different observed data bases. 

A computing system which finds the values of a function {:D, 7 R, can be 
viewed iformation-theoretically as a deterministic communications channel with 
input d € D, and output value f(d) € R, In [6], Elias considered the strictly 
informational limits on computer performance and obtained lower bounds on 
storage and access required in the computation of a single function. This was done 
by allowing freedom of choice of representation of the input and decoding of the 
output. Viewing the contents of a computer's memory as a codeword, Elias C7) 
dealt with questions about the use of codewords which are not sequences but are 
sets of bits at addresses scattered throughout a shared memory. The next step was 
to extend these results to the computation of a family of functions defined on a 
common domain. An overview of much of this work is given by Elias (91, and an 
analysis of the complexity of some simple retrieval problems with update was given 
by Elias and Flower [10]. Warner [22] has investigated the performance of 
retrieval systems for tables of entries. 

Let us note that information-theoretic approaches have been taken to other 
problems as well. The work of Kolmogorov C15] using minimal program length as a 
measure of computational complexity has an informational flavor. Also Chaiten [4] 
viewed the contents of memory as a program to be executed. Other work has been 
done relating to problems of exact and partial match and their storage arid access 


costs (Minsky and Papert [19], Rivest (20], [21]). 
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This thesis extends work that Elias has done, in which he has considered 
mary issues coricerned with storage and retrieval problems using a fixed size linear 
array. To allow the natural representation and manipulation of data, variable size 
arrays such as stacks, queues, dequeues, lists, and trees are frequently used. The 
fact that they have variable size makes different storage representations and 
accessing techniques appropriate; for instance, we must consider the basic 
operations of insertion of new elements and deletion of existing elements. 

We are interested in investigating certain costs associated with solving 
computer -implemented fist problems. In particular, we are concerned with lower 
bounds on the cost of storing a data base and on the cost of implementing a 
question or an update on the currently observed data base. The storage cost we 
measure in terms of the number of memory cells required for the data base 
representation. The implementation cost we measure in terms of the number of 
inemory accesses required, which is in general directly related to the time taken to 
perform an operation. 

We begin by in Chapter 2 discussing the formalism of our machine model and 
what it means to solve a list problem. Chapter 3 discusses storage and access costs 
and explains the notions of Kraft storage and access, indicating the types of cost 
bounds we might expect to obtain. In Chapter 4 we consider the entire set of table 
lookup questions and investigate consequences of achieving Kraft storage and access. 
Possible implementations for the table lookup question set are explored in Chapter 
5, where we discuss three types of representations: fixed length, endmarker, and 
pointer. These same representation classes are analyzed in Chapter 6 with respect to 
implementing stacks. Finally, we summarize our results, discuss how the techniques 
we have developed can also be used to help obtain storage and access bounds for 


queues and dequeues, and point out directions for future work. 


ae ee 
CHAPTER 2 


SOLUTION OF A LIST PROBLEM 


In this chapter we discuss our formal machine model and what it means to 
solve a list problem. This work is based on the model of a storage and retrieval 
problem developed by Elias [SJ, [6J, (8J. We shall here introduce much of the 
terminology and notation that is used throughout the thesis. We first define a 
storage aud retrieval problem, and then define our machine model and what It 
ineans for a machine to answer a question correctly. We discuss the distinction 
between the problem and machine domains and then define the machine 
representation of a problem domain. At this point we are finally in a position to 
state precisely what it means for a machine to solve a storage and retrieval problem. 
In the last section we summarize some of the ideas presented in the chapter, in 


order to clarify what we mean by the solution of a list problem. 
2.1 Definition of a Storage and Retrieval Problem 


Ler F be a family of functions (operations) defined on a common domain ID, 
and indexed by some index set J CIN, F = {fii€ J}. An operation f, € F is an 
ordered pair of functions f; = (q,,u,), where dom(f,) = ID and ran(u,) ¢ iD. We 
refer to an element d € ID as a data base. Executing operation f; on data base 
d € ID returns the value q,(d) and has the side effect of updating d to the new 
value u(d); we denote this by f.(d) = (q,(d),u,(d)). Q = tq,l (q,,u,) € F} as 
called the question set and U = {ul (q,,u,) € F} is called the update set of F. We 
refer to (F, ID) as a storage and retrieval problem. If the data base d is not 
changed as a consequence of executing f, (Le, if ud) =d), then (F, ID) 1s said 
to be a static problem, and we may write it as (Q, ID). In general, however, the 
data base may change with time, in which case (F, ID) is a dynamic problem, or a 


problem with update. 


ies 


Int this thesis, we shall consider storage and retrieval problems which represent 
list cata structures; we refer to these as dist problems, or simply problems. In Section 
2.1 we will be in a better position to explain precisely what we have in mind when 
we cliscuss the solution of a list problem. Let us begin by presenting a simple 
example of a storage and retrieval problem, which will illustrate some of the above 


terminology. Examples 2.2, 2.3, and 2.7 are extensions of this example. 


Example 2.1. Let ID = {d}0 <i <6} where each d, € ID is a string of symbols 
from the set X = {0,1}; ie, each d, € X™: 


d y=” d,=01 
d, = d, = 10 
d,= d,=ll 
d, = 00 


Note that we write dy) = A to indicate that dy is the null string, the string with no 
elements. Now consider two operations on ID, f, and f>. The function 
f, = (q Pe, is simply the identity question and update: 

qd) = d, 

uj(d) =d, 
Since u, causes no change to the data base d,, f, effectively has only a question 
component and so is a static operation. We define fz, however, to be a dynamic 
operation: f, = (q,,u,), where 


qaldo) =a us, 


(do) =dy 
qaldy) =a u,(d,) =do 
qald,) = aya, u,(d2) =dy 
qalds) = ayagag u,(d,) =d, 
qald4) = ayazay u,(d4) = de 
qld.) = azaza, u,(d,) =d, 
daldg) = 431A % u,(de) = ds 


Thus, executing the operation f, on data base d, gives the answer a,agaq and 
changes the current data base, dz, to the data base d,. Notice that 


dom(f,) = dom(f,) =D, ran(u,) =ID, and ran(u,) = {d9, d,, d2, U3} & ID. 


me ee 


So if we were to execute the sequence of operations f,,f,,fz,f. on d,, then we 
would expect the sequence of answers to be a,a9a9, d,, a4; ag and the resulting 


data base to be do. i 


We frequently denote the domain of a function, dom(f), by D, and similarly 
ran(f) by Rp Where we have a set F = {fli € J} of operations, we may find it 
convenient to write D, and A, for DF and Fp respectively. If there is no 
possibility of confusion, we may simply omit the subscripts and write D and R. 
For instance, D(S) denotes the domain of the set S. Note that when we discuss a 
problem (F, ID), we write ID to refer to the problem domain, which happens to be 


the common domain of each function f.€ F. 
2.2 Definition of the Machine Model 


Our machine model is a deterministic, sequential, random access 
cell-addressable, halting automaton Til, with a memory m consisting of L cells 
(where L may be infinite). The set of all possible coritents of a memory cell, 3, 
corresponds to TI's finite input alphabet, and BE” denotes the set of possible memory 
states. Via its memory, Tl stores a sequence b € Bl, which it reads in some order 
determined by the structure of Til and the values in & Tf may or may not rewrite 
values as it reads the cells, but it eventually prints a sequence of output symbols 
chosen from some finite output alphabet €. Since MM is deterministic, a given input 
(initial state of memory) always causes Ti to print the same output (if Th halts), so 
Th computes a partial function » from inputs in BE to outputs in E*. If we let 
D(™m) ¢ BE be the set of inputs for which M halts in finite time and M(1N) © E* 
be the set of outputs which 11 prints before it halts, then each automaton Til defines 
a "characteristic function” w:D(™M) 2 R(M). The only functions which 7 can 
actually compute are restrictions of its characteristic function to some subset of its 


acceptance Set. 


a 
2.3 Machine Computation of a Static Function 


Now that we have in mind a machine definition, let us investigate in what 
sense a machine Tf with L memory cells can compute a Static function qiD, 2 Ry 
Technically, a machine 1 can compute the values of a question q:iD, > Ry only 
when D, © D(M) and RS MN). It is often claimed, however, that a machine 
Tl computes a function q even when the machine alphabets and the problem 
alphabets are not identical. In such a case, the user also has in mind two 
nor-machine components: a coder and a decoder. The coder consists of some 
encoding relation 7:D, > ome from the domain of q onto a subset T, DM) ; 
each d € D, is taken into a subset t(d) ¢ Bh, and any string b € 7(d) is said to 
"represent" d. (We shall later use the symbol g to stand for an encoding function, 
as explained in sections 2.5 and 2.6. Using that terminology, our encoding relation 
7 will be seen to correspond to a relation 7%) The decoding function 5, ad Ry 
maps the subset RR, = o(D,) ¢ D(M) onto the range of q. The machine is said to 
compute q correctly if, for any d € D,, when any b€ t(d) is supplied to TM! and 
gives output e = w(b) = w e r(d), the decoding 5(e) of e satisfies 

q(d) = 8(e) =8eworl(d), JED, 
In particular, @ © T must be a function. These conditions are summarized in the 


following diagram, where all arrows denote total and onto functions or relations: 


7 
D, 7 2%, ¢ acm) < gl 
qv L restriction of w Low 

é 
Ky Rt, ¢ RM) < E* 


To help us understand all of this terminology, we consider the computation of 


question q, from the previous example. 
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Example — 2.2. Recall the question q, from Example 2.1, where 
Dy, ={d10<i < 6}, Ra, C {ap,a,,a2}*. Let MM be a deterministic, sequential 
halting automaton with a memory m consisting of three cells. Let G = {0,1,2,0}, 
E€ = {0,1,2}. MM operates as follows: it reads the string of inputs until it encounters 
a 0, reading in order memory cell 0, then cell 1, then cell 2; it interprets the string 
of characters from {0,1,2} as the ternary representation of a natural number; Til 
computes, also in Rep anya Wie square of this number, prints it, and then halts. So 
a(m) = U {0,1,2}'0{0,1,2,0}27 
= {000, 001, 002, 000, 100, 101, 102, 100, 200, 
201, 202, 200, 100, 110, 120, 200, 210, 220} 
Mr) = {0, 1, 11, 100, 121, 221, 1100, 1211, 2101}. 
MIL computes qp, correctly, if we choose our encoding and decoding relations 


appropriately. Let TD, > G° be defined as follows: 


(do) = {000, 001, 002, 000} t(d,) = {110} 
r(d,) = (100, 101, 102, 10} t(d,) = {120} 
T(d,) = {200, 201, 202, 200} t(d,) = {200} 


T(d,) = {100} 


Thus, one = D(M) - (210, 220} and Ra = M(M) - (1211, 2101}. Now define 
bt, + Ra, by 

6(0) =a, 6(121) =a,a,a, 

6(1) =a, 6(221) = apa,a, 

6(11) =a,a, 6(1100) = a,ayagag 


6(100) = a,agay 
So the machine Til with encoding + and decoding 6 computes Gz correctly. For 
instance, 
8owe t(d,) = be w(000) = 6(0) = ay = q,(dQ) 
Bowe (dg) = 46 © w(200) = 6(1100) = aja,apay = gp(d,) I 


= O0Gs 
2.4 The Problem Domain 


Because in this thesis we are concerned with representing list structures, we 
consider a data base d € ID to be a string of characters chosen from the problem 
alphabet X. For notational convenience, we formally represent d as a set of ld| 
ordered pairs, containing one value d(n) from the alphabet X for each n € IN less 
than Id]: 

d = {(n,d(n))10 <n < idl, d(n) € X}. 
When there is no chance of ambiguity, we may write d = x 1X 2X 4Xq to stand for 

d ={(0,x,), (1x2), (2,x,), (3,%4)}, 
where each x,€ X. Thus, what the formal ordered pair notation does is to 
explicitly state the implied order of characters in the string d. In an obvious way, 
the definition of d could be extended to include countably infinite strings; i.e., we 
may wish to consider the size of a data base d € ID to be unbounded. In this 
thesis, we shall consider only problem domains ID where for all d,, dz ¢ x 
d, € ID if and only-if d, € ID. Thus, if we allow a string d, € X* to be in the 
domain 'D, then all strings in X* are included in ID. Certainly there might be 
instances where we would want to restrict character sequences, but unless we 
consider specific applications it would be difficult to characterize the domain. 
Vherefore, we consider only problem domains ID of the form ID = U xs, for some 


i€J 
JCN. 


Example 2.3. In Example 2.1, the problem domain consists of seven data bases, 
ID = {d,10 <i <6}. The problem alphabet is X = {0,1}, and each d, € X*. In 
particular, 


D= U xefajuxu x4 
ie {0,1,2} 


The data base d,, for example, is the string 01 € X*%, which can be formally 
4 p ’ 


written as {(0,0), (1,1)}. Similarly, we can denote each d, € D: 


ee ee d., = {(0,0), (1,1)} = 01 
d, = {(0,0)} = d, = {(0,1), (1,0)} = 10 
iG ye 42101); (1 Pst 


dz = {(0,0), (1,0)} = 00 
Notice that the data base di is just the empty string, A. When we view d, as being 
represented by a set of ordered pairs, then dg ={} = @. Thus, we might either 


say that d, = A or that dy = @, depending on our viewpoint at the moment. I 
2.9 Machine Representation of the Problem Domain 


As we have observed, a data base itself cannot be stored in memory. Instead, 
we store some encoding of the data base, a string of values from the alphabet Z. 
Each d € ID is mapped by 7 into some subset of gl. It is unnecessarily restrictive, 
however, to require that an encoding 7 specify values for every memory cell. In 
fact, most computer systems allocate only certain sections of memory to a given 
user, and other users may write in the remaining cells of memory in ways unknown 
to the first user. In order to model practical memory allocation schemes such as 
linked lists (recall Section 1.2), it is necessary to allow an encoding to specify values 
for only some of the memory cells. 

Thus, we view t(d) as some set of codewords, a subset of the code C = 7( ID) 
(see Elias (81). Each codeword c € C is itself a finite set 

e = {(j, e(j)l j € Dle)} 

of lel ordered pairs. The first coordinate of each pair (j, c(j)) is the integer 
address j € IN of a cell in memory, and the second coordinate is the value e(j) € G 
assigned by ¢ to be stored at that address. ‘hus, each codeword in C is a partial 
function c:N > 8 from integer addresses to values in G; its domain, D(c), is a 
finite subset of IN. 

We denote by gt the class of all such partial functions from N to @ that are 
each defined on a finite domain. Thus, a codeword set C is just a subset C c 2. 


The domain D(C) of asetCc gt is the union of the domains of its members: 


09.2 


D(c) = U dle). 
c€C 


Example 2.4. Let G = {0,1}, and consider the code C, = {c9, ¢,, ¢z}, where 

cy = {(0,0), (2,1)} 

¢, = {(0,1), (1,0)} 

ec, = {(1,1), (2,0)} 
Each codeword ¢, is a partial function ¢:N » {0,1,2,0}, so ¢, € gt and C,c¢ gt, 
Notice that D(e,) = {0,2}, D(c,) = {0,1}, D(e,) = {1,2}, and D(C,) = {0,1,2}. 
We may find it convenient to represent C, aS an array, as in Figure 2.1, where the 
i‘? row represents codeword c. The entries in each row correspond to the contents 
of the corresponding memory cells. The he entry in row c, is the value ¢,( j) if 


j€ D(c,) and is blank if j ¢ D(c,). Each column corresponds to a memory cell 


address, here 0, 1, or 2. | 


Ko 
Cy C; 
C2 
0 1 2 
D(C) 


Figure 2.1. Representation of Code C, as an array. 


Recall that we write G to denote the set of all L-celled memories. ‘Then a 
memory state m is in GY if m € Gt and its domain is D(m) = {0,1,2,...,L-1}, so 
that 

m = {(0,m(0)), (Lym(1)), ++, (L-1L,m(L-1))}, 
where the first member of each pair (n, m(n)) is the integer address n € IN of a 


cell in memory, and the second is the contents m(n) € G of cell n. (Recall that it is 
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possible that L be infinite.) A codeword ¢ € C is stored in a memory m € gt by 
setting m(j) =e(j), for all j€ D(c). Other users may fill in the values of the 
L - lel cells not occupied by ¢ but must leave ¢ itself undisturbed. 
For any string b € Bt we can define its L-closure, b, , as the set 
| By = {me Be |b om) 
of memories in gl that store b, in the sense that the (address, value) pairs in b are 


included among those in m. For L < maxD(b), b, = @ Where the value L is 


understood, we frequently write 6 to mean 6,. Note that |b, = git Ho. 
Define the set 
a= U gl 
L>0 
of all finite memories that store values from 2. Then for b€ &*, 


D(b) = {0,1,...,L} for some L € N. So the L-closure of & contains all sequences 


in ge with prefix b: 6 = b> gi L-iol) 


Example 2.5. Recall code C, from Example 2.4, where @ = {0,1}. Since 
Ke) l= [aie le,l then I(¢,) | = os-lel 9 So for L =3 there are two memory 
states which contain the codeword c, In particular, 
(En), ={m€ Ble, Em} 
= {{(0,0), (1,0), (2,1)}, ((0,0), (1,1), (2,1) }}. 

We can represent the 3-closures of ¢g, ¢,, C2 in array form, as in Figure 2.2. 
Notice that no matter how other users may fill in memory cells n where n ¢ D(c,) ’ 
it is always possible to tell precisely what codeword ¢, is being stored. Since L = 3 
and @ = {0,1}, there are eight possible memory states, six of which store codewords 
from C,. 

Also note that 

(E.), = 2 


{{(0,0), (1,0), (2,1), (3,0)}, {(0,0), (1,0), (2,1), (3,1)}, 
{(0,0), (1,1), (2,1), (3,0)}, {(0,0), (1,1), (2,1),(3,L) 


. A 
Since c, € G", c, € B*, but cy, cg ¢ B J 


ll 


(E) 4 


Figure 2.2. Representation of the closures of codewords in Cy. 


Having discussed what we mean by an encoding 7:ID + gb and a code 
Cc at, we can now explain what we shall mean by a representation g:iID 7 gt. 
Throughout the thesis, unless otherwise specified, we always make the assumption 
that is a one-to-one function. Thus p(d) is a single codeword in gt, and 

(Wd,,d,€D)(izj = pld)) = pld)). 
The one-to-one condition guarantees that distinct data bases d,; and d; map to 
distinct codewords, Since 
py (d) = {m€ BE | pd) <m}, 

we can see that the relation # corresponds to the relation 7 in Section 2.3. When 
T's memory contains precisely L cells, a specification of a representation 9 
indicates, for any d € ID, that the cells in D( p(d)) be filled in as specified and the 
remaining cells can be filled in any possible way by other users. 

For instance, suppose we have some representation p, for which 
P(d,) = {(0,1), (2,0)}; ie, dy € ID is represented by any memory state in which 
m(0) =1 and m(2) =0. Since the value m(2) to be stored in cell 2 is not 


specified, cell 2 corresponds to a "don't care". For L =3, we shall find it 


255. 


convenient to write p(d,) = 10 to mean pldg) = {(0,1), (2,0)}. Where L ts 
understood, we may even write p(dy) = 10 rather than pl(d,) = 10__ for L = 5; 
i.e,, we may suppress all trailing “don't cares", which serve simply as place holders. 

We saw in Example 2.5 that if ¢, € C, is stored in memory, then it is always 
possible to distinguish c,, no matter what other users have done with cells not in 
D(x). In other words, there is no memory state in gl that stores both c, and Cj) 


for 1 = j. When this is the case, we say that c, and c, are distinguishable. 


Definition. Let p:iID > @*, and let d,,d,€ 1D. Then p(d,) and p(d,) are 
said to be distinguishable if and only if 

Pp (d,) NP, (d,) = w 
for any L > max{maxD( pld,)), maxD(p(d,))}. 


In other words, a code C c gr is distinguishable if and only if the closures of its 
members are pairwise disjoint (see Elias [8]). 

If there exist d,, d, € ID such that p(d,) and p(d,) are not distinguishable, 
then for some memory state my it is not possible to tell whether d, or dz, is stored; 
in fact, m, represents both d, and dz. We do not want to allow this loss of 


information and so make the following formal definition of a representation. 


Definition. We say that a function p:ID > gt is a representation if ard orily 


if for alid,, dp € ID, whered, #d,, pld,) and pld,) are distinguishable. 


Example 2.6. Let ID = {d,, d,,¢,}, G = {0,1}, and L = 3. Consider the function 
piD 3 B* defined by 


Ady) = C1 
pld,) = 10. 
pld,) =_00 


Then p is tiot a representation, because it does not have disjoint 3-closures. In 
particular, p(d,) and p(d,) are not distinguishable: 


Ald ,) M pld,) = {100, 101} m {000, 100} = {100} I 
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Example 2.7. Let us define the function p:ID > 8° by 


Aldy) = 00 Ald) = 110 
Ald,) = 10. Ad.) = 120 
Pld.) = 20 pd.) = 200 
pld,) = 100 


Notice that 


Pld) = {(0,0), (1,0)}4 = (000, 001, 002, 000}. 
Thus, there are four memory states that correspond to a representation of do, and 


the relation # is identical to the relation t of Example 2.2. i 


From now on, we define an encoder by specifying a representation function /. 


Then any string b € PB, (d) represents the data base d. 
2.6 Solution of Dynamic Problems 


In Section 2.3 we explained what it means for a machine Tl to answer correctly 
a question q. Now that we have also discussed what we mean by a representation, 
we can explicitly state what we mean when we say that a machine Ti solves some 
list problem. 

We can extend the notion of the computation of a function (question) q to 
include the solution of a set of questions Q = {q,l1€ J}, where each q,:D > R, 
maps a common domain ID onto its own range R, Since the ranges are in general 
different for different questions, a set 4 = {8)i€ J} of different decodings is 
allowed. For the solution of the family of questions Q, we introduce a_ set 
MH = {Wl i € J} of machines with a family 2 = {wl i € J} of different characteristic 
functions, where wD, > I. We can consider Tf to be a single device, with a set 
Sq = {sli € J} of distinct initial states, or programs. ™, is the submachine 
corresponding to Tl started in the initial state s. We say that (Tl, p, A) solves 
(Q, ID) if, for all i € J, Tl, computes q, correctly. In other words, if (1M;, p, 6;) 
computes q,, then for any m € #, (d), q,(d) = 6; © w,(m). 


os 


Having seen what it means for a machine to solve a static problem (Q, ID), 
let us now extend this to include updates. Recall that in our discussion of the 
machine model, it was mentioned that Ill may rewrite some of its memory cells. 
Thus, when given some input m,, Tl may halt in a new memory state m,. For a 
machine Tl which computes a single function f, if we want to be able to compute f 
several times in succession, then it is natural to require that this mew memory State 
be in TM's acceptance set. In fact, if TM, solves (q,, u,) correctly, then performing 
an update function on any memory state containing p(d) leaves us with a memory 
state that is a representation of the problem domain update function ud). In 
general, we want a machine Tf to compute a family of functions F, and so we 
represent our update function in the machine domain by the family of functions 
T = {uli © J}, where usD(M) > DUM) for TCM) = U 2(1m,) c gl. 

i€J 

Definition. Consider the machine Th={Mli€ J} with the family 

M = {wl i € J} of characteristic functions and the family T = {u,li € J} of 

update functions. We say that (1M, p, A) solves the dynamic problem (F, iD) 

if the following conditions are satisfied for all f, = (q,,u,) € F: 

(1) qd) = 8, © 0, ¢ p, (a) 
(2) up) (d)) © p,(u,(d)). 


2.7 Solution of a List Problem 


In this section we merely want to summarize what we shall mean when we talk 
about the solution of a list problem. 

First, recall from Section 2.1 that a list problem is a storage and retrieval 
problem (F, ID) where the domain elements have some list structure, e.g., they may 
be stacks. In any case the problem domain ID consists of strings of characters 
chosen from the problem alphabet X and is of the following form: |D = U x! for 

i€J 
some J ¢ IN. For any d € ID, we want to be able to perform the operations in F; 


eg., TOP (return the value at the top of the stack) and POP. 
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If a machine Il is to solve the list problem (F, ID), then there must be some 
way to represent each d € ID in the cells of T's memory with machine alphabet J. 
In particular, there is some one-to-one representation function p:ID > ie and any 
p(d) stored in m can be viewed as some sort of codeword. The representation has 
the property that it is always possible to determine what (if any) codeword is 
currently stored in memory. What other users do cannot interfere with this 
determination. 

Suppose the current memory state is m), where my € #,(d). Then 1, will 
output the answer w, ° p, (d) and halt in the new memory state u(f,(d)). If we 
claim that MH, computes the — furiction f= (qu) ¢F, then 
ui(,(d)) SB, (u(d)) and there must be some sort of decoding function 6, such 
that qj(d) =6,°,° f,(d). In other words, ‘1M, outputs the machine 
representation of q,(d¢) and halts in a memory state which is included in the set of 
memory states that represent u,(d). 

We say that (Tl, p, 4) solves the list problem (F, ID) if the above conditions 
are satisfied for all f; € F and for all d € ID, For simplicity, we shall also assume 
that each decoding function 6,€ A is one-to-one. Thus, we speak of a system 
(il, p) solving a problem (F, ID). 

When we discuss the machine solution of a problem (F, ID), we have in mind 
a representation of the domain ID in memory and some collection Q of algorithms 
or programs which compute the functions F. Any algorithm Q; that we discuss can 
be implemented by a machine Tf, as defined above. Since we do not, however, 
always want to concern ourselves with all the details of the machine itself, we shall 
henceforth speak of a system ((2, p) solving a problem (F, ID). Thus we specify 
an implementation by defining the function p and by, in some (usually 
program-like) form, presenting the set of algorithms Q (which can be implemented 


by machine 7M). 
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CHAPTER 3 
STORAGE AND ACCESS COSTS 


In Section 3.1 we introduce various system costs involved in solving a problem. 
Since in this thesis we are concerned with obtaining lower bounds on storage and 
access costs, these costs are discussed more fully in sections 3.2 and 3.3, respectively. 
We first define our cost measures and then present some basic results. For further 


information the interested reader is referred to Elias (6), (91, (10). 
3.1 System Costs 


Many different systems can be used to solve the same problem, and the choice 
among them depends on their relative costs. There are three basic components of 
system cost: 

(1) Storage cost. There is always some sort of purchase or rental cost for 
the memory used to store the representation of a data base. 

(2) Access cost. This refers to the number of memory cell accesses made 
by an algorithm or machine and is a partial indication of the time 
required by a system to answer a question or perform an update. 

(4) Processor cost. This involves the costs in memory and logic of the 
algorithm or machine Mtl itself, 

For several reasons, we do not in this thesis consider the processor cost. First, 
any such measure would reflect characteristics of the particular machine, and it is 
therefore difficult to determine an appropriate measure. We have deliberately tried 
to let our machine model be as general as possible. Second, the list implementations 
we do consider are in general quite straightforward and therefore a system which 
does well for both storage and access costs probably would not have a prohibitive 
processor cost. Third, the storage-access trade-off is easier to recognize and we do 


not want the current analysis to become too complex. 


40 
3.2 Storage Costs 


One measure of the memory requirements of a retrieval system (Q, g) solving 
a problem (F, ID) is the number of memory cells dedicated to the storage of a 


representation in memory. 


Definition. Consider a system (Q, 9) solving a problem (F, ID), and 
assume that # is a function. The memory storage cost, |p(d)|, associated with 
any cata base d € ID is the number of memory cells for which representation 
pf specifies a value when representing d: 


lp(d)} & ID( p(d)) I. 


Thus, we define |p(d)| to be the number of memory cells occupied by the codeword 
p(d). There is, however, no requirement that the set of occupied cells be 
contiguous; i.e, there may be "gaps" or "holes" in the representation. Because we 
are essentially concerned with obtaining lower bounds, we charge only for the cells 


actually occupied by p(d) and do not charge for these gaps. 


Example 3.1. Let @ = {0,1} and define the code C, = {¢9, ¢,, €2, C3, C4} as 


follows: 
co =O) c, = 000 
¢; =10- ¢, = 111 
c, =_10 


Suppose that ID = {do, d,, dd, d4} and the representation p:ID > 8" is defined 
by p(d,) = ¢, Then 

lp(do)| = Ip(d,)] = lp(d,)| = 2 
and IA(d,)| = lp(d,) 1 = 3. 1 


Certainly the issue of memory management is an important one, because it 
may be difficult to efficiently allocate to a single user the unspecified memory cells 


corresponding to holes in another user's memory space. Elias [9] has addressed the 


ae ae 


problem of assigning a contiguous section of memory, defining the span of a 
representation p to be the smallest set of contiguous memory cells capable of holding 
the representation of any domain element. Many representation schemes we shall 
construct will be able to avoid such gaps, at least when the problem alphabet is of 
the appropriate size. 

Our storage cost measure does not indicate the complexity of the encoding p. 
For a static problem, storing a representation would be only a one-time task. When 
we consider dynamic problems, the complexity of the representation will evidence 
itself in the costs of performing updates. In general, a complicated encoding results 
in higher access costs. 

Consider a code Cc gt that has the property that for each c€C, 
D(c) = {0,1,... ,lel-1}; ie, Cc B® Then C is said to be a prefix code, or to be 
prefix-free, if none of its members is a prefix of any other. In other words, a 
prefix-free set C c G* has the property that 

(Voi pe5.6C) (oy e,): 
As noted by Elias [8], a code C c G* is distinguishable if and only if it is a prefix 
code. 

The well known Kraft inequality [2], (12], C16] states that a necessary and 
sufficient condition for the existence of a prefix code with codeword lengths 


O,, @3,..., & and codeword characters chosen from the alphabet Z is that: 


k ef 

Dee <2, 

i=l 
This result is probably most easily seen by recalling the simple correspondence 
between prefix codes and labeled trees. Each node corresponds to a memory cell 
number, and the branch labels correspond to the cell contents; ie., there are |Zi 
branches from each node. Each codeword is associated with a distinct leaf. We 
adopt the convention that the leftmost branch of each node always corresponds to 
the same element by € &, and similarly for each of the other branches. For full 
trees this convention eliminates the need for writing the labels on branches 


emanating from nor-root nodes. In particular, for G = {0,1}, we always let a 
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leftward branch correspond to a zero and a rightward branch to a one. 


Example 3.2. Recall the representation g:ID > G* from Example 2.7. The code 
PID) = {00, 10, 20, 100, 110, 120, 200} 


is a prefix code and satifies the Kraft inequality because 


c€ p( ID) - 4 


The tree corresponding to the code p( ID) is illustrated in Figure 3.1. 


Ald,) Pld4) pld,) Ald.) 


Figure 3.1. Tree corresponding to p from Example 3.2. 


Elias has extended the Kraft inequality to any distinguishable code C c Br 


Theorem 3.1. (Elias [8]). Let C c &* be distinguishable. Then 


-lel 
2 ial <i. (3.1) 
c€&C 
Equivalently, consider any representation p:ID > gt. Then 
-lp(d)| 
Sig (3.2) 
d€ID 


Proof: Let 

Cy ={e € CL 2 max D(c)} 
be the subset of the code C whose elements can be stored in an L-cell memory. 
Since C is distinguishable, the closures of its members are disjoint and we have 


Uz, cat. 
c€C) 
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L-lel 
Recalling also that lé,| = |6l , we obtain 


L-le| L 
> iis < Ia 
c€Cy 
Now dividing through by gl gives 
_-le| 
2 ilo <i. 
c€Cy 


Since C, ©Cyp 4, 


-le| lel 
2 16s KD IB RE 
c€C) C&C) ay 


and so 


-le| 
lim( > io) el. 
L>0 ¢&C, 


This proves (3.1). Since any representation g is by definition distinguishable, the 


Kraft inequality also holds for representation storage costs and thus (3.2) follows. 1 


Theorem 3.1 is a statement about distributions of the storage measure |p(d)I for any 
representation g of domain ID. Not all data bases in ID can have short 
representations, since a small value of |p(d)| corresponds to a large term in the 
Krafe sum. If some of the data bases have relatively short representations then 
others must have relatively long representations. If, in fact, we have equality in 
the Kraft sum, then no data base representation can be shortened without 


lengthening another data base representation. 


Definition. We say that a representation p achieves Kraft storage if and only 
if the Kraft sum of equation (3.2) is satisfied with equality: 


5 gy el 


déiD 
Similarly, a code C achieves Kraft storage if the Kraft sum of equation (3.1) is 


=1 (3.3) 


equal to one. 


ee 


We can also extend our usage of trees to correspond to any distinguishable 
code C ¢ Bt. However, sitice we do not restrict ourselves to prefix codes (i.e, we 
allow scattered representations), we would not necessarily choose to have the 
memory cells read in order 0,1, 2,...0n the path to every leaf. his and the 


result of Theorem 3.1 are illustrated in the following example. 


Example 3.3. a) For code C, of Example 2.4, i@| = 2 and 
-le| : 
her hee a eee 
céC, 
A tree corresponding to C, is given in Figure 3.2a, with the memory cells listed in 
order 0, 1, 2. On the other hand, we might choose to represent C, by the tree in 


Figure 3.2b, In any case, C, does not achieve Kraft storage. 


(b) 


Figure 3.2. Trees corresponding to code C,. 


b) For code C, of Example 3.1, 


“lel 
Sigh a3 9849-9821 
c€C, 


and so Cz achieves Kraft storage. A tree for code C. is given in Figure 3.3 
c) Recall once again the representation p:ID + 6* from examples 2.7 and 3.2. Then 


since each d € ID has a unique representation p(d): 


-lo(d > 
a a3 -42 eg eee dicy 
déID c€p('D) 


Cy Cp Cp Cy 


Figure 3.3. Tree corresponding to code C,. 


When we solve some problem we would like to find a representation that docs 
not result in high storage costs. We say that a representation g:ID > gt is optimal 
in storage if no other representation requires less storage for some data base without 


requiring more storage for another, 


Definition. A representation function p:ID > Bt achieves optimal storage if 
and only if for any p7:ID > gt 
(Vd, € ID)E(Ip’(d,)| < lp(d,l) = (Sd, € ID) (1p*(d,)1 > lela) 1). 


Thus, we use the term optimal storage for a representation if no other 
representation can uniformly do better. ‘There may, of course, be many 
representations that are storage optimal, and which would be preferred depends on 
the particular problem and is conditional on the probabilities of the various data 
bases in ID. In fact, one might not choose to use a storage optimal representation at 
all if such a representation resulted in higher access or other system costs. However, 
these involve details of particular problems and, for the general framework we are 
considering, we shall not usually prefer one optimal representation over another. 

if a representation g meets the Kraft sum with equality, then p is storage 
optimal, This condition makes it easy to recognize certain storage optimal 


representations. 


Kae 


Theorem 3.2. Consider the representation function g:ID > at. It 


-l—~Ad 
lat ee 
déiD 


then 9 is storage optimal. In other words, if g achieves Kraft storage then 9 is 


Pf 


storage optimal. 


Proof: If pis not storage optimal, then there exists some representation p%:ID > gt 
such that (Vd € ID)(lp“(a)1 < Ip(d) 1) and (3d, € ID)(1p’(dy)1 < eld ,) 1). But 


this says that 


-lp(d)| 
1+ 5 ta’ 
déiD 
‘-lp(d)| -lp(d,)| 
: asia 
déiD-{d,} 

-lp’(d)| -lp(d,)| 
ge i ag 
déD-{d ;} 

-lp’(d)| 
<> tal’ 
déiID 
which contradicts the Kraft inequality of Theorem 3.1. | 


Example 3.4. Recall Example 2.7 where @ = {0,1,2,0}, and consider the alternative 


encoding p,:ID B* defined by 


pz(d_) = 0 pd) a ol 
pld,) = pzld_) = 02 
pd.) = 00 

and also the encoding p,:ID > 8* defined by 
pa(d,) = 0 pa(d,) = 2 
p3ld2) = 02 Paldg) = Ol 
f,(d,) = 00 


By Theorem 3.2, both p, and p, are storage optimal because 


2D oe? 


-lp,(d -\pald 
Ge eee Sa 
d€ID d¢éID 


On the other hand, p as defined in Example 2.7 is not storage optimal because pf, 
does better; in fact, P, takes less storage everywhere: 

(Vd € D)(Ip,(¢)1 < Ip(d) 1). 
The representation f, also does better than p, because it never uses more storage 
and sometimes uses less, 

If we were forced to pay a very high price for storage, we would probably 
choose to solve the problem (F, ID) of Example 2.1 using representation p, or pz 
rather than p. However, p corresponds to a simple ternary representation (with 0 
serving as an endmarker) and might be more desirable than Pz or Pp, in terms of 


other costs. J 


We have seen that a code p(ID) achieves optimal storage if we get equality in 
the Kraft sum. Let us examine the conditions under which this equality is attained. 
We first define a distinguishable code C c 8" to be complete if and only if for all 
c’ € gt, C U {e%} 1s not distinguishable. Elias [8] has shown that a finite 


distunguishable code C ¢c gi is complete if and only if the L-closure of its members 


lel 


partitions 8“ (for L = maxD(e)) which is true if 2 IZ =]. The converse is 
c€C 
_ lel 
not true, ie, a code C may be complete even if 2 |B #1, 
c€&C 


Example 3.5. Recalling Example 3.3, we sce that C, is not complete, since C, ¢ Co. 
However, C, is complete. If we look at the trees for C, and Cy, given in figures 3.2 
arid 3.3, it is easy to see that C, does not partition {0,138, since there are some 
leaves in the tree for C, that correspond to no codeword, Also, by Example 3.3 we 
know that C, does not achieve Kraft storage and thus cannot be complete (since it 


is finite) ; C, does achieve Kraft storage and is therefore complete. i 


ee ee 


We can conclude that, as illustrated in the above example, a finite |Gl-ary code C is 
complete if and only if every leaf in a full |Gl-ary tree for C corresponds to some 
codeword c € C. 

Using the terminology of representations, we can show that if a representation 


p:lID = 8 achieves Kraft storage, then p(ID) is complete. 


Theorem 3.3. Let p:iD > Bt be some representation which achieves Kraft 


storage. Then for all b € gt, there is some d € ID such that b © J, (d). 


Proof: Let p achieve Kraft storage and assume that there is some bo € B* such 
that, for alld € ID, 6, ¢ A, (d). In other words, by and d are distinguishable, for 


every d. Then 


-lpld)l -lp(d)| 
dee ee ae oe 
d€IDUL{b} déID 
which contradicts the fact that p achieves Kraft storage. i 


The converse is not true (see, once again, Elias (83). However, if p(ID) is 
complete for ID finite, then we do know that p achieves Kraft storage. 

Let us briefly mention two results concerning worst case and average storage 
costs. The first result follows from well-known tree properties (see eg. Callager 
C12]) and states that for any representation p:ID > gt there is some data base 
whose representation specifies values for at least flog IDI] memory cells. On the 


other hand, for any domain ID there is some representation which never requires 


more than Nog, IO memory cells. 


Theorem 3.4. (Elias [6]). (i) For any representation function p:ID > ie 
max lp(d)| >Tlog IIDN 
déID Bl 

(ii) There is some representation function g:ID > 6* such that 


max |p(d)} =Mlog IIDN 
deb ela 


This result can be interpreted in terms of any tree corresponding to the |Gl-ary 


Bags 


distinguishable code p(ID), where there must be at least |ID| leaves (since p 1s 
one-to-one). Since the tree is |Gl-ary, the depth of the tree (i.e., the length of the 
longest codeword) must be at least Hog, (D1. Also, a complete, full |Gl-ary tree 
with |ID| leaves has all of its jaa” at either depth Mee aN or depth 
Fog IDM - 1. : 
“Tl 

The second result involves average storage costs. There will be occasions 
where we wish to consider some sort of probability distribution P on the members 
of our domain ID: 


P(d) &the fraction of time a user expects to consider data base d € ID. 


Thus, it makes sense to look at the average storage cost: 


2 P(d)- pla). 
. d€ID 
We can use a procedure such as Huffman encoding [13], [12] to construct a 
representation p for which very probable data bases have short representations and 
less probable data bases have longer representations. Other preconstructed 
universal codes perform almost as well as lluffman codes, provided the shorter 


preconstructed representations are assigned to the more probable data bases (see 


Elias (7J). 
Theorem 3.5. (Elias [6]). Consider a domain ID and assume there is some 
probability distribution P on ID. Define the entropy H(ID) by 
H(ID) =- 2 P(d)log P(d). 
déID Il 
(i) For any representation function piD > at, the average storage cost 1s 
2, P(d)- led)! > HUD). 
déID 
(ii) There is some representation function p:iD > 8 such that 


> P(d)-lp(d)! < HUD) +1. 
delD 


ie 
3.3. Access Costs 


A user is necessarily concerned with the amount of time it takes to perform an 
operation f on some d € ID. The number of memory cell accesses made by an 
algorithm before halting is one direct indication of the performance time. This 
memory access measure has been used by Minsky and Papert [19] and Ehas C5]. 
The number of accesses made to memory will depend not only on the algorithm 
used but also on the particular data base which is stored, 

There are various ways in which we could define an access, but we use the 
nation commonly used in Turing machine theory. A machine or algorithm reads a 
cell and, depending on that cell's contents, may rewrite the value stored there; this 
corresponds to only one access. We also choose to allow an algorithm to possibly 
read a cell in another user's memory space, but the algorithm certainly cannot 


rewrite such a cell (without being charged for it in storage). 


Definition. Consider a system ((2, p) solving a problem (F, ID). A memory 
cell access is made each time Q moves to a new cell. Once Q references a cell, 


it may read andc/ or rewrite the cell contents; this constitutes a single access. 


Depending on the hardware of an actual machine, this reading and then rewriting 
action might require two accesses, in which case our results could be off by a factor 
of two, Flower [11] has investigated update costs and shown that it is necessary for 
an access measure to involve both reads and writes; considering either reads or 
writes alone does not give reasonable lower bounds. 

We present the following example in order to illustrate some of the 
terminology we shall use when we discuss the implementation of a function. We 
frequently find it convenient to describe an algorithm using a program-like 


description. 


S.A] S 


Example 3.6. Recall examples 2.1 and 2.7 and consider the problem of performing 
the update operation u, on some data base d € ID. The following algorithm, Oy? 


performs the update. (For simplicity, we do not here consider the question 


component of the function f >.) 


ay if m(0) =O then return 
if m{0) =1 then if m(1) =O then m(1) « 0 
return 
if m(1) =1 then m({1) < 6 
m(G) « 2 
return 
if m(1) =2 then m(1) < 0 
m(0) « 2 
return 
if m(1) = 0 then m(0) < 0 
return 


if m(0) =2 then m(0) «1 
return 


For instance, suppose we have Ald) in memory. Civen that we know there 
is some p(d,) stored, when we access cell 0 and discover that m(0) =, then we 
know that it is dg stored. Since u,(d,) =d5, we do not need to rewrite any 
memory cells. Thus, performing the u, operation on p(dQ), using algorithm Oy 
involves only a reading of cell 0, 

Suppose d, is stored in memory with representation p. Using algorithm Qu 
we first access cell 0. Since m(0) =1, we next access cell 1. Since m(1) = 2, we 


rewrite cell 1, setting it to the new value 0, and then backtrack arid set m(0) «2. | 


Because we spend a great deal of time discussing algorithms for performing various 
operations, we fird it convenient to make some notational definitions for dealing 


with memory access costs. 


Definition. Suppose a system ((Q, 9) solves a problem (F, ID). ‘When for 
each d € ID we can define the following. 


CA;(p(d))]  & the sequence of memory cell accesses made by algorithm 
(A; in computing fi(d) using representation p. 
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HLQ,( p(d))J & ILA, p(d)) i, the number of memory cell accesses made 
by algorithm Q, in computing f,(d) using representation 
p. 

{LA,(p(d))]} & the set of memory cells accessed by algorithm GQ, in 


computing f,(d) using representation ; i.e., the access set 
for fi(d) corresponding to algorithm Q,. 


We may sometimes write [f,((d))] to denote the access sequence which an 


algorithm QQ, uses to compute f,( p(d)). 
We refer back to Example 3.6 to illustrate the above definition. 


Example 3.7. Recall the algorithm Oi, of Example 3.6. In computing u,(d,), 
Q., first reads cell 0, then reads and rewrites cell 1, and then backtracks and writes 
cell 0. Thus, the access sequence is 0,1, 0. For notational convenience, when we 
give am access sequence we shall underline any memory cell accesses which 


corresporid to writes: 


CA,Cp(d,))I] = 0 CAC ald ,))] = 010 
CAC ld ,))] = 010 [0,(p(d,))] = 010 
[a,( p(d,))] =0 CA,(p(d,))] = 0 


[a,( p(d,))] = OL 
Then for the number of memory cell access in each case we clearly have: 
ALA, | Ald,))] = ALA, Ad,))] = Wa, (p(d2)1 =f 
ALA, Ad,))] = HLA, | Ald ,))] = HAL, Ald.) = 3 
Ma, (eld s))I = 2 
Note also that the access sets are just: 
(LA, (Ald 9) I} = (ta, ( p(d,))1} = (Ca, Ald) J} = {0} 
(CA, Cod) I} = (CO, (ald s))J} = (ta, (Ald)J} = {Cay (ple) )3} = {0,1}1 


Since our algorithms are sequential and deterministic, we find it convenient to 
model them by access trees. Access trees are basically the same as the trees we used 


in Section 3.2, where each internal node corresponds to a memory cell access. An 
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access tree corresponding to the algorithm for a question q will label each leaf by 
the appropriate answer q(d), if there is ore. We speak of the access tree for q; (or 


u,) to mean the access tree for an algorithm Q, solving q, (or uj). 


Example 3.8. Consider the static problem (F,'D) where F = {f,,f,} and 
ID = {dg, d,, dz}. Define the representation function p:ID > {0,1}* by: 


Aldy) = 00 
pld,) =10 
Ald,) =__l 
Let q, and q, be defined as follows: 
q,(d,) =a q,(dq) =a 
q,(d,) =b q,(d,) =b 


where a,b € € An access tree corresponding to the obvious algorithm for q, is 
given in Figure 34a. Notice that, in fact, two accesses are necessary to distinguish 
Ald,) from p(d,) or p(d,) and thus two accesses are required to determine the 
leaf that can be labelled a. Question q,, however, can be answered after a single 


access, to cell 2. I 


(a) q, . (b) q, 


Figure 3.4, Access trees for q, and qp of Example 3.8 


Each output corresponds to some leaf on the access tree for q,, and we define 


ax ( r) to be the minimum depth of any leaf labelled r. 


oe 


Definition. Suppose a system ((, p) solves a Static problem (Q, ID), and 
let ID (r) = {ad € Dil qld) = r}. 
Then 


(r) &@ min AA p(d))1. 
d€ID (1) 


Similar to our storage result, we have a Kraft inequality for access. 


Theorem 3.6. (Elias [6]). If the |Gl-ary system (Q, g) solves a static 
problem (Q, ID), then for all q, € Q: 


yee SA: (3.4) 
r€q,(ID) 


Corresponding to each answer r € q,( ID), the range of q,, there is one term in the 
summation with negative exponent a (r). This theorem is a statement about 
distributions on the numbers of accesses to return the answers r € R and tells us 
that not all operations in q,(ID) can have short retrieval times. In fact, equation 
(3.4) can be strengthened; it holds not only for a,(r), the minimum number of 
accesses ro return the value r, but also for the number of accesses to return the 
value 1 for any d € ea a In other words, if we let d, € gy 4 then we have 
Sy MM pld,)) 


i= 


te 
<1 


Definition. Suppose a |@l-ary system (Q, p) solves a_ static problem 
(Q, ID). Then Q, is said to achieve Kraft access if 
-a.(r) 
gl! 
r€q (ID) 
In fact, if (3.5) holds and ((0,,p) is understood, we shall frequently say 


= 1, (3.5) 


Simply that q, achieves Kraft access. 


If we “assume q, achieves Kraft access", we mean that we are considering some 


system (QQ, p) where (Q, achieves Kraft access and answers q, on domain ID. 


ahG 2s 


In accessing a cell we read some b, € J. Information-theoretically, ore access 
s { a) 


m 


distinguishes among |G 


possibilities, and if it is not the case that each of these [Z| 
possible cell contents leads to a different answer, then we have in some sense 
obtained more information than is needed. Thus, if an algorithm Q achieves Kraft 
access, then its access tree must be a full tree where every leaf corresponds to a 


distinct r € R. In particular, we have the following result. 


Theorem 3.7. Suppose a system (Q, g) solves a problem (Q, ID). If Q, 
achieves Kraft access, then for alld,, d, € 'D(r), 


ALO,( pd ,))1 = ALA,( pld,)) 1. 
Let's look again at the problem from the previous example. 


Example 3.9, Recall Example 3.8, and let R, and R, denote q,(iD) and q,(D), 
respectively. For q,! 
-a,(r) -a,(r) 
2 lal | es 
r€R, ré{a,b} 


areata d cn. 


1 
nN 


Notice that the access tree for q, in Figure 34a does not have a distinct label for 
each leaf and so cannot achieve Kraft access. For q,: 
POAT) oy oe 
aloe eta ed, 


r€Rs 
and so does achieve Kraft access, which is what we would expect by observing 


Figure 3.4b. ! 


As we did for storage costs, we define an implementation or algorithm to be 
optimal in access if no other implementation of the operation requires fewer accesses 
for some data base representation without requiring more accesses for some other 


data base represetitation. 
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Definition. An implementation (Q,, ) is access optimal if and only if for 
any other implernentation ee p): 
(Wd, € ID)E(AlO,“( pld,))I < HA,( p(d,)))) 
> (dd, € ID) (#10,7( p(d,))] > #la,(plu,))I)] 


Similar to our result for Kraft storage, if Q, achieves Kraft access then Q, is access 


optimal. 


Theorem 3.8. Suppose the |Gl-ary system (Q, p) solves the static problem 


(Q, D). If 


a(r) 
> |g =1 
r€q (ID) 
then GQ, is access optimal. 


Unless we allow the trivial question, which always returns the same value no 
matter what data base is stored in memory, then it is always necessary to make at 


least one access to answer a question. 


Theorem 3.9. Giveri any implementation (Q, p), assume that G,(pld)) is 
not a constant function, Then, for all d € ID, 


HLQ,(p(d))1 > 1. 
Corollary 3.9.1. If #(Q,( p(d))] = 1 for alld € ID, then Q, is access optimal. 


if IRI < Gl, then when we access one cell we can distinguish Il characters, 
whereas we only have [Al distinct answers. ‘Therefore we have in some sense 
obtained more information than we can use, giving us an inequality in the Kraft 


sum, as the next theorem shows. 


Theorem 3.10. Consider a |Gl-ary system ((4, p) which answers the question 


q:ID > R. If Q achieves Kraft access, then IR] > IG. 


SAT 


Proof: Assume ( meets Kraft with equality. Then by Theorem 3.9 it is always the 


case that a(r) > 1, and so 


r= Dial < J gt =e 
r€R r€R “ 
lf [RI < 14] then we get a contradiction. I 


Notice that this theorem does not depend on the representation usecl. 
Assume we have an implementation that achieves Kraft access for some set Q 
of questions. This then tells us something about the possible relative range sizes of 


questions in Q. We first recall a lemma about trees (see eg. Knuth (143). 


Lemma 3.1. There is a full |@l-ary tree with k leaves if and only if there 1s 
some n € IN such that k = (I@l - 1): n+1. (The number n corresponds to the 


number of internal riodes in the tree.) 


From this lemma and recalling that we have cquality in the Kraft sum only when 
the exponents correspond to the depths of the leaves in a full tree, we have the 
following theorem. 
Theorem 3.11. (Gallager [12]). Let JoN. If 2 (av =1) then 
i€J 
lJl=n- (Gi -1) +1 for somené€ WN. 
This now tells us something about the possible pairwise relative sizes of the ranges 


of questions that each achieve Kraft access. 


Theorem 3.12. Consider a {Zl-ary system (Q, p) which answers the 
questions q,:ID > R, and q,:ID + R,, and assume both q, and q, achieve 


Kraft access. Then there is some integer n such that IR,| - IR,| = n-(IGi - 1). 


AGS 


Proof: Since both q, and qy achieve Kraft access, 
-a,(1) -a,(r) 
> gl ad and ~ ei 2° =1 

r€R, r€Ry 

By Theorem 3.11 we thus know that there exist Ny, Ny & IN such that 
IRjl=1+n,(al - 1) 
and IR,l =1+n,(1al - 1) 

Therefore, IR,|- IR,| = (ny - 1) (Bl - 1). i 


We find this theorem useful for some of the results we shall prove later. 

As was the case wher we discussed storage, it is difficult to understand what 
the Kraft inequality of Theorem 3.8 tells us about access costs of interest to the user, 
except when we actually do achieve Kraft access. Thus, we mention two results 
concerning access costs; these correspond to the storage theorems 3.4 and 3.5. 

First, if we need to distinguish IR,| answers with a |Gl-ary tree, it is clear that 
the access tree must have maximum depth at least Fog, |Full Also, it is always 
possible to answer a question q, in such a way that the corresponding access tree 


has maximum depth exactly Tog |R,1. 


Fe 
Theorem 3.13. (Elias [6]). Consider a problem (F,ID). 


(1) If the IGl-ary system (Q,, p) answers the question q,, then 


> Tloe {RI 
Sigh 


max a(1) 
r€R, 


(1i) There is some |l-ary system (Q,, ) that answers question q, such that 


max a@j(r) = [og IRI. 
r€R, GI 


The bound in (ii) can be attained by using a representation g which stores in 
memory the arswers to each question in Q. Thus, to answer q,, @, simply reads 
the 1'” answer (see Elias and Flower [10]). 

If there is some known probability distribution P on ID, this induces a 


probability distribution P; on R, defined by 


Pte) 


1 


= 2 Pd) 
déiD,(r) 
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where D(r) = {4 € Di ald) = Boarg.i5 


‘Thesnen 3p ia Sl Goreiaes Peper tt, D), and assume there 


is a probability distribution P on D. Define the entropy H{R,) by 

Dis oveioi2 Hetd yd inesm a jedw tm w tigsda zuorvetgy onli iti 
ws ncn ay gn er 

HetA 2noinibno jadw bau viazola' 3 codaete igs Sais aid ctl 229298 
otis ort d6 AGI PY SEM at PLAS OE SHRP S es AE ior syero%e 
lite 9n10j2 TB1A to ae plea Rite oa ne NOLiZINp quAsol 

(ii) There is some l@inacy, systems shad apuaetageesion9 Ach, that 
aye Ka, Ad) < WR) *1 


novinied Lb 
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CHAPTER 4 
THE TABLE LOOKUP QUESTION SET 


In the previous chapter we discussed what is rneant by Kraft storage and 
access, In this chapter we shall examine more closely under what conditions Kraft 
storage and Kraft access can be achieved. In particular, we consider the table 
lookup question set and attempt to understand the implications of Kraft storage and 


access and to get a feel for some storage-access tradeoffs. 


4.1 Definition 


If for all i we kriow the i" element in a list, then we have determined the list. 
Thus, in some sense this forms a complete set of questions on any domain ID, 


because answering these allows us to answer any other question. 


Definition. Define the table lookup question set 


T= {yl1 <i < maxlal} 
d¢iD 


which has as its it? member the function yD» X defined by y,(d¢) = d(i). 


For i > Id|, we say y,(d) 2 2, 


Thus, each data base d € ID is mapped onto the value of its i element. When 

i > Id|, then we want y,(d) to return a null answer, which we denote by g. 
Consider a system (GQ, p) solving (1, ID). As was mentioned in Section 3.3, 

if we say that y, achieves Kraft access, we mean that Q, solving , achieves Kraft 


access. In general, although , is defined in the problem domain, we may 


1 
informally refer to Y, in the machine domain; in particular, we say that vy ( pld)) 


accesses cell k to mean that k € {[Q,( p(d) ) J}. 


S6h 


Example 4.1. Recall Example 2.3, where D = {a} UX U X? for X = {0,1}. Then 
we have, for instance, 

v (dq) =97,0) = @ = ¥2(d5) 

v,(d,) =7,(0) =0 

¥,(d,) = 7,(0) i 

vj(dg) =7,(1L) =1= y(d,). 
Alternatively, we may informally write, using the representation p given in 


Example 2.7: 
¥(pld,)) = -7,(200) =1 = ¥,( pld,)) I 
If we are going to achieve Kraft access for all questions in the table lookup 


question set, then for [Z| > 2 the ranges of all the questions must be the same. 


Theorem 4.1. Let ¥,, aes TI and assume that iZl > 2. If y, and 7; both 
achieve Kraft access, then R, = R,, where R, = R(y,(ID)). 


Proof: Consider a table lookup question y on p:ID > By. Since iD = U x}, 
where J CIN, then either R(y,(ID)) =X or R(y,(D)) = X U {9}. ae 
IR; # IRA Then IR, - IRj| = +h. By Theorem 3.12 we know that 
IR\||- IR =n + (Gl -1) = #1, and so the only solution is for IG| = 2, n= 41, 
Vhus, if IZ] > 2 we obtain a contradiction, proving that IR,| = IR J, which implies 


that RK, = Rj I 


It is easy to show that the condition |@| > 2 is necessary in the above theorem. 


Example 4.2. Let ID = X UX? where X = {a,b} and define the representation 
gID > {0,1}* by: 


pla) = 00_ plab) = Oll 
plb) = 10. pba) = 110 
plaa) = 010 p(bb) = lil 


Then the table lookup question set I’ = 5 vo} can be solved by algorithms with 
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access trees as shown in Figure 4.1. It is clear by observation of these trees that 


both y, and Y, achieve Kraft access, and yet R, = X whereas R, = X U {go}. I 


(a) ¥, (b) ¥2 


Figure 4.1. Access trees for y, and , of Example 4.2. 


It inimediately follows from the previous theorem that if we have Kraft access 
for the set of table lookup questions, and IZ] > 2, then a € ID except when ID = X” 


for some n. 


Theorem 4.2, Let gl > 2. If all y, € I achieve Kraft access, then either 
» € ID or else ID = X" for some n € N*, 


Proof: Let ID# xX". Then there exist d,,d,€ 1D such that Id,| < ld,| So 


vy, (d,) = 2 and my 


(ID)) =X U{g}. Now assume that A ¢ ID. Then 
ld ol ld, 


R(y,(ID)) =X. But by Theorem 4.1 this says that y, and Vi | can't both 
2 

achieve Kraft access, a contradiction. Therefore @ € ID. I 

Thus, if Ig] >2, then A¢ID implies that ID =X" for some n. Because 


R, = X U {g}, we know that if ID # X”, then @ € R, 


Corollary 4.2.1. Let |G] > 2. If all y, € T achieve Kraft access and there is 
non €IN* such that ID = xX", then R(y,(ID)) = X U {2}, for all y, € T. 


meee 
4.2 Kraft Access with Overlapping Access Sets 


In this section we discuss achieving Kraft access for the set of table lookup 
questions I’ and frequently refer to the set of memory cells accessed in order to 


answer some ¥, € I’. 


Definition. Let p be a representation p:ID at, and let ¥,, 7, € YT, Then 
we say that y, and ¥, have overlapping access sets if, for some d € ID, 


{Ly (pd) ) I} 9 {Cy (pla) ) I} # 2. 


We shall show that, for I@| > 2, if ally, € T° achieve Kraft access then there can be 
no overlapping access sets (see Theorem 4.4). For the case |G] = 2, two access sets 
{Ly (p(d)) I} and iLy ,( p(d))]} can overlap, but in at most one cell and only 
when X* ¢ ID, for alli <k < j (see Theorem 4.8). Where all vy, € T’ achieve Kraft 


access we also show that 
IT] 


2HLy,( p(d))1 < pla) + Tl - 1, 
i=t 


and if the y, do not have overlapping access sets then 
IT| 
DaLy,( pd))1 < Ila). 
ist 


(see corollaries 4.7.1 and 4.5.1). 
Consider any representation p and suppose that ¥,, Y2 € I’ meet Kraft access. 
Our first theorem says that if y,(pld,)) and y,(p(d,)) access some cell in 
common, then ID does not include all strings of the form 
d,(i) +R, 


or all strings R,-d,(j). 


Theorem 4.3. Consider a representation p:iD Bt and let Vin V5 T’ each 
achieve Kraft access. Suppose there exists d, € ID such that ¥,(p(d,)) and 


x 4( pd ,)) access some cell in common. Then 


a ae 


“(Vr € Ry) (Ad, € ID)(d,(i) = d (i) and d,(j) = +) 
and “(Vr € RK) (4d, € ID) (d,(i) =r and d,(j) = ¢,(j)). 


Proof: For and d, € ID, tet p(d,) € m, Suppose there is some d, € ID such that 
veld ,)) and y(pld,)) both access cell k. Let m(k) =b,€G Since 
Yi ¥; © T achieve Kraft access and access cell k then, for alld, € ID, 

dp(i) =d,(i) > m(k) =b, 

d,(j) =d,(j) = m(k) = b,. 
Since y, achieves Kraft access, we know there is some string d3 € ID such that 
m(k) *b,, and y, accesses cell k. So there is no way to represent a string dy 
where 

d (i) =d,(i) and d,(j) = d,(j) 
Similarly, there is mo way to represent a string d,, where 


d.(i) = d,(i) and d,(j) = d,()). ] 


The intuition behind the preceding theorem can perhaps best be seen by picturing 
the access trees for two table lookup questions, as we do in the following example. 
This gives us an example of overlapping storage, although we obviously can't 


represent all strings in the product of the ranges. 


Example 4.3. Let @ = {0,1,2}, and let X = {x,|1 <i < 9}; ue, IGl = 3 and IX] = 9. 
Suppose Y, and yz, have the ternary access trees as shown in Figure 4.2 and 
therefore achieve Kraft access. These trees indicate that, for instance, 
Axo %_) = 0121 and p(x,g-x,) = 12202. The only time we have overlapping 
access sets is for d € ID such that d(1) =x,, xz, or Xg; ie, ¥,(p(d)) =X, Xz, 
or xX, So we can certainly represent any pairs of strings in x, X, where 
xX, ¢ Le ekaa Nal It is also possible to represent the pairs of strings Xg° X4) X7° Xa, 


and xg: xj where K€ {X4,X a} 1 
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Ve 


Xa X4 X5 Xo Xg Xqg 


Figure 4.2, Access Trees for y, and y, of Example 4.3. 


So if , and v; do overlap in access of cell k, then it is not possible for iD to 


include a string d,, such that p(d,(i)) has some value b, in cell k and p(d,(j)) 


1? 
has some value bz # b, in cell k. If y, meets Kraft access, then its access tree is 
full, so there will be at least |G| elements d € ID such that y,(p(d)) accesses cell k. 
Similarly for v5 Let S be the set of strings in the domain that agree with d, in 
every position except the ji: 

S = {d€ ID! d(n) =d,(n) for alln J}. 
Then ISI < IR, - (Gl - 1), since there must be at least IG] - 1 characters r in R, 


th 


such that we cannot represent any string in X* whose i-” component is d,(1) and 


t , 
whose j'" component is r. 


Lemma 4.1. Consider any representation p:ID > gt and let ¥,, ¥; «eT 
achieve Kraft access. Suppose that for d, € ID, y, and "Vy access some cell in 


common. Then 


|U {ad} 1s RJ - Wie. 
d€iD 


Proof: ,(pld,)) and 7 ;( p(d,)) access some cell in common. Since y, meets 
Kraft access, then for p(d,) Sm,, ¥,(p(d,)) corresponds to m,(k) =b€z. 
Since ¥, also meets Kraft access, there are at least |Z] - 1 values for d(j) that do 


not have m(k) =b. Thus, | U fa(j)} 1 < IR| - 1éi+ 1 I 
d€ID 
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Recall that for a pair of table lookup questions y, and ¥ js where i < j, then 
R, = Rj if y, and ¥, achieve Kraft access and I@| > 2. If d(i) =x € X, then all we 
know is that d(j) € X U {g@}. On the other hand, we know that if d(i) = @, then 
d(j) = 3 in this case there are |X| combinations of d(i) and d(j) that do not 
exist for any d € ID. So perhaps there could be some representation scheme that 
would allow us to overlap accesses. The next theorem follows from Lemma 4.1 and 


shows that there is no such scheme. 


Theorem 4.4. Consider a representation p:iD > Bt, where |Z] > 2, and let 
Yn Vy TI’ each achieve Kraft access. Then, for all d € 1D, y,(p(d)) and 


7 ( p(d)) access no cells in common. 


Proof: Assume there exists d, € ID such that ¥,(e(d,)) and y,(pld,)) each 
access cell k, i < j. Then since all y, achieve Kraft access, for all b € @ there is 
some d, € ID such that y,(p(d,)) causes cell k to be accessed and m,(k) = b, 
where p(d,) Sm,. Since not all leaf descendants of node k in the access tree for 
Y, can be labelled g, there is some d. € ID such that d,(i) = @ and vf Ald.)) 
accesses cell k. If we let ID, = {d € ID | d(i) =d,(i)}, then we have 

|U {aC} 1S IR - l+ 1 < Ixl+ 1 - lal < Ix 

déD, 


But by the way we have defined a problem domain, there are |X| data bases d € ID 
that differ from d, only in the ii position. This gives a contradiction and so, for 


alld € ID, ¥,(p(d)) and ¥ p(d)) do not have overlapping access sets. I 


Since for any d € ID each ¥, accesses a distinct set of cells, the total number of 


accesses made by the various y's cannot be more than |p(d) |. 


Theorem 4.5. Consider any representation p: ID > B* and assume all eae be 


achieve Kraft access. If Y, and V4 access no cells in common, then 
IT| 
2 aly,( p(d))1 < lpla)l. 


i=l 


ee 
From theorems 4.4 and 4.8 we can immediately get the following result. 


Corollary 4.5.1. Consider any representation p: ID > gt, where |G] > 2, and 


let all y, € I’ achieve Kraft access. Then 
Ir| 
2 aly (p(d))] < tpl). 
1= 


Unfortunately, Theorem 4.4 does not hold for IG] = 2. In other words, it is 
possible for y, and 7; fo achieve Kraft access and also access some cell in conymon. 


Example 4.4. Let @ = {0,1}, X = {a,b}, and ID = {a} UX? UX*% Consider the 


representation g:ID > gt defined as follows: 


d eld) 
r» 10.0. 
aa 0100_ 
ab 0110_ 
ba 1100_ 
bb 1110_ 
aaa 01010 
aab O1011 
aba 01110 
abb 01111 
baa 11010 
bab 11011 
bba 11110 
bbb (lili 


Since A € ID, R; = (a,b, a} for i € {1,2,3}. Possible access trees for ¥,, Ya. Y3 are 
shown in Figure 4.3. Notice that y, and Y, may both access cell 1, and we have 


the following storage allocation: 


(ae ee 
TL. Ve Ys 


Without altering the access trees, we could extend g and ID so as to also include the 
element a € X, by letting pCa) = 00.0. It would not, however, be possible to 


similarly include b in the domain, because p(b) would require cell 1 to be set to 1 


5 Sh-< 


and also to 0. 


Figure 4.3. Access trees for ¥;, Y2. Y3 of Example 4.4. 


We can see that Corollary 4.5.1 does not hold for IG@l = 2, since for d = bab, 
pA(bab) = 1011 and: | 
3 


2 Wy (11011)] =2+2+42 = 6 > lp(bab)|. 
i= 1 ‘ 


Notice also that g does not achieve Kraft storage: 


-|p(d)| 
> 1a ad) =4-9449°94Q.95 5 De], I 
eID 6 


The following lemma shows for |@l = 2 that if y, and 7, each have Kraft 
access, and if they both access cell k, then the access trees for y, and v; each have 


a node labelled k leading to a leaf # via a branch labelled b € Z. 


Lemma 4.2, Let |Z] = 2 and let b, b’* € 8, b # b’. Consider a representation 
pID - gt, and assume that ¥,, 7; ¢€ T achieve Kraft access and that 
Ke CU (ly(Ca))3} A UL (ly (o(a)) Ip. 
d¢iD dé€D 
Choose elements x,,x,€R, and xy, x4 € Ry such that m,(k) = b, 
molk) =b*, m,(k) =b, m(k) =b’, where m, 2 AAx,). Then either 


Ny =Xg = SHOX, = ky = f. 


Proof: Clearly g cannot represent a string d, where d,(i) = x, and d,(j) = x, or 
a string d, where d,(i) = x, and d,(j) =x, There are two cases to consider: 


(1) If x, € X then x, = a, since we do not necessarily need to represent d(i) € X 
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and d(j) = g, but we must be able to represent d(i) € X and d(j) € X. This tells 
us that x, # @ and so x, € X, Since we cannot represent d,, then x, = &. 
(it) If X, = #, then x,€X and x, € x. Since we cannot represent d,, then 


Xa = S. I 


In Example 4.4, the access sets for y, and 7, each included the cell 1. Notice that 
in each of their access trees, the left branch from the node labelled 1 led to the leaf 
3 using the terminology of Lemma 4.2, Xy =X, = f. 

Lemma 4.2 allows us to prove that at most one cell can be in two access sets, if 


we achieve Kraft access. 


Theorem 4.6. Assume ¥;,, 7; ¢ T’ achieve Kraft access. Then the access sets 


for ¥; and Y ; contain at most one cell in common. 


Proof: |f y, and Y 4 access two cells in common then by Lemma 4.2 each tree has 


two leaves @, which violates our assumption of Kraft access. i 


We can, in fact, make the even stronger statement that if we achieve Kraft access 
for all of TF’ then any table lookup question 7; ¢ T’ can access only one cell that any 


other y ; € I’ accesses. The following theorem formalizes this. 


Theorem 4.7. Consider a representation p:ID > gt, and assume that all 
as I’ achieve Kraft access. If Vp Vj both access cell k,; and ¥,, Y, both 


access cell k,, then ky =k3. 


Proof: By Lemma 4.2, we know that node k, in v's access tree leads to a leaf 
labelled g@. But also node k, in the tree for y, must lead to a leaf g. Since ¥, 


achieves Kraft access, there can be at most one leaf labelled g, and sok, = k3. I 


This gives us a result similar to Theorem 4.5, for the case where we allow access 
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Corollary 4.7.1. Consider any representation p:ID > B* and assume that all 


vy, € T achieve Kraft access. If we allow access overlap, then 
IT| 
2 Wy ( pld))I < Ipla)i + IT - 2. 
i=1 


Proof: From Theorem 4.5 we recall that where there is no access overlap, then 
IT| 


Daly ,( pld))I < Ipld) |. 
\=1 
Now from Theorem 4.7 we know that each , can have at most one cell in common 


with any other V5 NY) 
IT | 
2 My, ( p(d))1 < pla) + Tl - 1. 1 
i= 1 


Example 4.5. Recall Example 4.4, where 
3 
2 aly ( p(bab))] =2+2+2=6< Ip(bab)l+(M-1=5+3-1=71 1 
iv 


The next example verifies that, in fact, the bound in the above corollary is the best 


possible. We achieve this bound when all 7, ¢ T’ access some cell in common. 


Example 4.6. Let @ = {0,1}, X = {a,b}, and ID={a}UX®*% Consider the 


representation p:ID > 8" defined as follows: 


d pd) 
nN 10.0 
aaa 0101 
aab 0100 
aba 0111 
abb 0110 
baa 1101 
bab 1100 
bba {111 
bbb 1110 


Consider the access trees for ¥,, Y2. Y3 Shown in Figure 4.4. Then it is easy to see 


that 


S61:s 


3 
DALy,( p(d))1 < lela) + IP - b 
iz1 


In particular, 


3 
Daly (pA))1=$ <3+3-1 
i | 
and 2 Ly ,( pl bba) )I =6<4+3-L I 
i=4 
v1 Ve 3 
(0) 
a (1) @ b 
w a @ a 


Figure 4.4. Access trees for ¥,, Y2. Y4 of Example 4.6. 


Essentially, we were able to allow access overlap in Example 4.4 because we 
did not need to represent the strings ag@ or baw. This was because we restricted 
ID so that X' ¢ ID. If it is necessary, however, to represent the situation where 
7(p(d)) = @ and ¥(p(d)) = g, then no overlap between y, atid ¥, is possible. 


In fact, for IGl = 2, this works in both directions, as the next theorem shows. 


Theorem 4.8. Let |Zi = 2 and let ¥,, 7; T’ each achieve Kraft access. There 
exists a representation p:iID » gt such that 7, and 74 access some cell in 


common if and only if X* ¢ D for alli gk <j. 


Proof: ( =») As in the proof of Theorem 4.4, we can assume without loss of 
generality that y,(p(d,)) # @. Then IR(y Cd 4)) = |X| +1. But by Lemma 4.1, 
if y, and ‘y, access some cell in common, then v ( p(d)) can take on at most 
IR - 1Bl+ 2 SIX] + 2-181 SIXT < IXl +1 
values. So , and Yj access no cells in common. 
( =) If there exists no k such that i <k < j, then 
v(pld)) €X =f pld)) € xX 
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and vy (pld)) = 2 = 7 (p(d)) = 2g. 

We can always construct a representation p such that Y, and v; will both access 
some cell k. Let the access tree for y, have exactly one node corresponding to an 
access of cell k, and let this node be at a greater depth than any other nonleaf 
node. Let the left branch from this node lead to a leaf labeled @ and the right 
branch lead to some other leaf x, € X. Then construct the access tree for ‘y, such 
that the root is labeled k, and its left branch leads directly to a leaf labeled 2. 
This allows us to represent all strings X -X and g-+ g, and yet ¥, and v; both 


access cell k. I 


Pe Oe 
4.3 Achieving Kraft Storage and Kraft Access 


We have seen in Example 4.4 that it is possible to have Kraft access with 
overlapping access sets, although that particular representation did not achieve 
Kraft storage. This leads us to wonder whether it is even possible to achieve both 


Kraft storage and Kraft access; the following example shows us that it is. 


1}, X = {a,b}, D = {a} U x4, and define p'D > 3 by 


Example 4.7. Let G = {0 
d Ad) 
Xx 


0 
aa 100 
ab 101 
ba . 110 
bb lll 


Now consider the access trees for y, and y, as shown in Figure 4.4. Clearly v; 
and Y, each achieve Kraft access. It is also the case, however, that p achieves 
Kraft storage, since 
-|p(d)| 
9 P 
d€iD 


=2'4+4-2%21 
Now notice that 
2 
2. ALY ( p(ab))] = Aly ,(plab))] + aly ( plab))] =2+2 =4 > Iplu)l 
j= 
and so Corollary 4.8.1 does not hold for IG@l = 2, even when we achieve both Kraft 


storage and Kraft access. I 


The main results of this section, theorems 4.9 and 4.10, tell us that if we achieve 
both Kraft storage and Kraft access then our domain must be of the form ID = X™ 
or ID = {asuxX 

We are now in a position to prove our first of two main results of this section: 
if we are to have Kraft storage and access and not allow overlapping access sets, 


then ID = X",. We first prove the following lemma. 


Figure 4.4, Access trees corresponding to y, and , of Example 4.7. 


Lemma 4.3. Consider a representation p:ID > Bt and assume that all 7,¢€T 
achieve Kraft access. Then, fork < II'l, 
“Baily (pls) )J 
Dla en a (4.1) 
séR* | 


where RX # Ryo Rao... Ry 


Proof: We prove this result by induction on k. 

Basis: Since y, achieves Kraft access, by Theorem 3.7 we have 
Ly (p(s) )] -a ,(s) 

Sig re Si 


sR! s¢R! 
Induction step: Let Ryyy = {tystgy+++ oT qt) and assume that (4.1) holds for RK, 


i, 


Then 
ea 


ein(Ae) at pls)) 1-0, 4(pCs)) 
sR‘ seR* 1, 
5 gp SHEN ALS) II-MEY  (A08))3 


s€R*. Tp 


+ 


k 
eS yg Bt AG))I-H07 (065) 
eRe 
Since ¥,.,, achieves Kraft access, then for r ¢ R* and r € R, we have 
ALY. 4,( ple) )] = a, Cr) 


This gives us 
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seR**} séRK , 
+ gy eet) > Pear Als) 
s¢eR* ' 
: 5 
weet tah Mette) y yg eeens( Ald) 
s€R* 


By our inductive assumption and since we are given that ¥,,, achieves Kraft access 


fork+1< I, this becomes: 


a etn ( p(s) )J ~ 4446) 


lal See 


0). 4,(1r2) rr +. al 


+ 16 
= 1. I 


We now prove our desired theorem. 


Theorem 4.9. Consider a representation g:ID > Bt which achieves Kraft 


storage and assume that all y, € T° achieve Kraft access. If for all y,, v6 T 
Boe ))} a Pan ily ( pl d))]} = 2, 


then ID = xX” 


IT 
Proof: Let R, denote the set of strings of length I[1, where each element 1s 


IT] 


chosen from R, Define the one-to-one function gD > R, by 
g(d) = 7,(d)-y,(d)-... Vile): 
Assume that A € ID. Then, by Corollary 4.2.1, R(y,(ID)) = X U{@} for alli. But 


IT | 
for all y, €T, ¥,(p(d)) = @ implies that y,(p(d)) = 2. Sog(ID) # R, © since, 


IT|-1 
eg., Bh ¢ g(ID). Because we have Kraft access and no overlapping access 


sets, a 


PS ia <> pretties 
dD d€ID 


by Theorem 4.5 


Tal} 


ap eV? (s))J | . 
= since g is 1-1 
s€p( p)” 


oh 


iis 
a eter ACs) since g( D)cR, 


=1 by Lemma 4.3. 


This gives a contradiction, since we know that p achieves Kraft storage. So A = ID, 


which by Theorem 4.2 says that ID = xX". I 


As we saw in Example 4.7, the condition that there be no access set overlap is 
necessary in the above theorem. 


From theorems 4.9 and 4.4, we have the following corollary. 


Corollary 4.9.1. Let |Gl > 2, and consider a representation p:ID > Bt which 
achieves Kraft storage. If all y, € I’ achieve Kraft access, then ID = x”, 


k 
Because we shall frequently consider domains of the form ID = ae ee it is worth 
i=0 
noting that with a domain in this form, it is not possible to attain both Kraft 
storage and Kraft access. 


k 


Corollary 4.9.2. Let ID = Ux', for k >0, and consider a representation 
i=0 


pilD = B*. Assume that all vy, €T achieve Kraft access. Then p does not 


achieve Kraft storage. 


Although we have proved that Kraft access, Kraft storage, and mo access set 
overlap implies that ID = ¥", we know by Example 4.7 that it is also possible to 
have, for some domain ID = x" both Kraft storage and access with access overlap. 
Example 4.7 is not an isolated case; ie., the next example illustrates that it is not 


necessary that IX] = 2 or that ID = {a} U x4, 


Example 4.8. Let G = {0,1}, X = {a,b,c,d}, ID = {a} U3, and define p:ID > gt 
as indicated in Figure 4.5. Such a definition is possible because only cell 0 is in two 


access sets, and m(0) =1 for alld € D except d = %. For instance, 


ghee 


d ld) 


a 0 

aaa 10. 0011 
aba 10. OllL 
aca 10.1011 
ada 10. 1ill 
bac 110 0000 
bbc 110.0160 
bec 110.1000 
bdc 110.1100 
cad 11100010 
cbd 11100110 
ccd 11101010 
cdd 11101110 


This system has overlapping access sets and achieves Kraft access. In fact, we also 


have Kraft storage, since 
-lp(d)| 
ad 
d¢ID 


= 914 42.978 42 297749. 42.978 2], I 


Figure 4.5. Access trees corresponding to ¥,, Y2, and 4 of Example 4.8. 


Now we want to determine for what possible domains ID we can get Kraft 
storage and access if we allow overlapping access sets. Certainly we know that 
Gl = 2, and recalling examples 4.7 and 4.8 we might suppose that ID is of the form 


{A} UX", as is indeed the case. 
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Lemma 4.4. Let IGl =2 and consider a representation :ID + gt which 
achieves Kraft storage. Assume that IT] >1 and Wp, € I" achieve Kraft 
access, and 


U tty(olu))3} a U fey oa) 0} = fk}. 
d€ID d€ID 


Then the access trees for y, and aj each have root node labelled k. 


Proof: Let i < j. By Lemma 4.2, we know that 2 € R, and @€ R, Assume that 
the access tree for , has root with label t, =k and that the access tree for ¥5 has 
root tz. Without loss of generality, let the leaf in tree y, with label @ have t, = 0; 
i.e., p(s) has m(t,) = 0. Since t, # k the node k must be a descendant of t,, and 
there exists x, €X such that p(x,) also has m(t,) =0. Clearly there is some 
Xo € X such that p(x,) has m(t,) = 1. From Lemma 4.2 we know that in the tree 
Y; we must also have the @ leaf a descendant of node k, with m(k) = 0. Thus 
d, ¢ ID, where d,(i) = x, and d,(j) = since p would require setting m(k) = 1 
and m(k) = 0. Since ¥,, 7 ; achieve Kraft access, then by Theorem 4.8 it must be 
the case that X? ¢ ID for i<p <j. On the other hand, we know that p does 
achieve Kraft storage. So by Theorem 3.3 there is some d,€ ID such that 
d,(i) =x, and d,(j) = 2, which contradicts the fact that X’ ¢D fori<p <j. 


Thus, t, = k and we can similarly show that t, = k. I 


This lemma allows us to prove our second main result of the section: If we have 


Kraft access, Kraft storage, and access overlap, then ID = {a} UX". 


Theorem 4.10. Consider a representation p:ID > Bt which achieves Kraft 
Storage, and assume that all ¥, € I’ achieve Kraft access. If there exist 


Vp saree fi such that y, and v5 have overlapping access sets, then 
ID = {as uxX® 
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Proof: Assume , and v; both access cell k, and assume there is some Y,, € T’ that 
does not access cell k. Then we can represent d,(i) = 2, d,(j) = 2, d,(m) € X, 
indicating we don't have Kraft storage, a contradiction. So if y, and Y; both access 


cell k, then for all y,, € I’, y,, accesses cell k. By Lemma 4.4, y,, has root node k 


IT | 


with one branch to leaf g. ‘Thus, we can represent exactly the strings @ and 


IT 
X  ,andso iD ={asux® i 


In Theorem 4.5 we showed that if we meet Kraft access and have no access set 
overlap, then |p(d)| is an upper bound on the total number of accesses made in 
reading all the elements in p(d). We now show that for any |@/ > 2, if we achieve 
Kraft storage then every cell must be accessed in answering some question v, Thus 


lp(d){ is a lower bound on the total number of accesses to read p(d). 


Theorem 4.11. If the representation g:iID > Bt achieves Kraft storage, then 


for alld € ID: 


k€ D(pld)) =k € U {fy (p(a)) I} 
7 «~ 


Proof: We define S$ to be the set of cells accessed by asking of some d, € ID each 
of the questions y,: S = Lily (Ad 03) We want to prove that 
7,6 
k € D(pld,)) =k €S. 
Define the representation p,:ID > gt by: 
pd) ford #d, 
p,(d) = 
{(k,m(k)) [k€S} ford =d, 


Then p, is a representation because p is: for do, d, € ID where d, #d,, d, #d,, 
we have 

Pild,) N P,(d) # w@ » pld,) N Alds) # w@ ed, =ds, 
and for d, € ID, d, «= d,, we have 


Pild,) NA (d,) = o> (Vy, ET) (y(rld,)) = ¥,(pld,))) ed, = dy. 


re (i ae 


Assume there exists k € D( p(d,)) such that k ¢S. Then |p,(d,)1 = ISI < lAld,)1 


and 
-lp,(@)| -lp,(a@)| -lp,(d,)| 
See Se ae 
déID déD-{d,} 
-lpld)| -lp(d,)| 
> gt Ol gts 
d€ID-{d,} 
-lp(d)| 
a Sig sa 
d¢ID 
This violates the fact that p achieves Kraft storage, sok € D( p(d,)) >k €S. 1 


Corollary 4.11.1. If the representation :ID > Bt achieves Kraft storage, then 


for alld © ID: 
IT] 
D Hy ( p(d))I > Ipld)|. 
ist 


From theorems 4.5 and 4.11, we immediately have the following result. 


Theorem 4.12. Consider a representation p:iID > Bt which achieves Kraft 
Storage arid assume that all 7, € I’ achieve Kraft access. If there is no access 


set overlap, then for all d € ID: 
IT 


2 ay, o(d))] = lp(a) |. 
i=1 


n 
Since we are in general considering list problems where ID = U X'}, Theorem 4.9 
i=0 
holds for the cases of particular interest to us. 
n 
Corollary 4.12.1. If ID = U xX! and the representation :ID > Bt achieves 
i=0 
Kraft storage, and all y, € I achieve Kraft access, then for all d € ID: 
IT 
D Aly ( o(d))I] = Ipla) 
i= 
n 
Thus, for list problems where ID = U ae if all y, € T° achieve Kraft access then ¥; 
1=0 


and Y | access no cells in common. 


= I a 
4.4 Storage Consequences of Kraft Access 


We conclude this chapter by examining some consequences of Kraft access for 
the set of table lookup questions. In particular, achieving Kraft access tells us 
something about the minimum and maximum possible values of |p(d) |: 

maxle(d)| > IT) (Tog IRIT - 1) 
d€ID lal 
and minlp(d)| > Il - 1. 
d¢ID 
In general we have even better bounds. 


In order to lower bound |p(d)|, we first prove two lemmas. 


Lemma 4.5. Let p:ID > BT be any representation. ‘Then 


(Vy, €T) (ad € D)(aly,( p(d))] > Mog IRN). 
Proof: By Theorem 3.13, maxa(r) >Tlog IR, 
r€R, GI 
so max Aly (p(d))I > Slog IRN 
dep) Sia! 
and this immediately gives our desired result. | 


Lemma 4.6. Let ID = UX! and let p:lID > BT be any representation. Then 
i¢€ 


I 
(dd, € D)(Vy, € T) (aly, (pd ,))] > Fog RM). 


Proof: Let d, € ID be the database defined as follows: 


d, 4{d,(i) =r, 1 (Cr, € X) A (a@,(r) = max a(r)) A (0 2i < IT!)} 
r€R, 
It 1s always possible to define such ad,. Now recalling Theorem 3.13, 


My ( p(d,))1 = ay a(r) > Flog Rl i 


We now show it is always the case that 
maxlp(d)| 2 IT1-Tlog IRI - IT + 1 
déiD Fe 

and almost always the case that 


elo(e)! > IPT Sloe 1AM, 
ded ial 


= ae 


Theorem 4.13. Let ID = ae and p:ID > 8B" be any representation. Assume 
i€ 
that all ¥, € T’ achieve Kraft access. Then we can conclude the following, 


where we write |R| to denote minlR,l. 
i¢€ 


(a) maxlp(d)! >IT (Mog IRM -1) +1 
déID lal 
(b) If there are no overlapping access sets for Vj € I or if there is no j € N* 


such that |X| = 2%, then 


maxlp(d)| > IT) - 1 IRI. 
aD? Fla 


Proof: (a) By Corollary 4.7.1, 
Ir| 
2 ly,( p(d))I > Ip(a)l + IP) - 1 
i= 4 

From Lemma 4.6, there exists ad, € ID such that 
irl Ir] 
2 Hly,( pld,))1 > Blog 1R,I1. 
i=1 iv 1B 


Combining these, we get 


IT 
lad ,)1 > Ztlog IRIT- ITl+4 
i «IS 
>ITl- (Tog, Jan Sd ped 
and so maxlp(d)1 >IT flog IRN -1) +1 
déiD re 
(b) (i) If there are no overlapping access sets, then Theorem 4.5 tells us that 
T| 


2 aly ( p(d))1 < lpla)l, 
and so we conclude that - 
pe >IT Fog, | RIN. 

(ii) If we do have overlapping access sets, then by Theorem 4.4 we know that 
IG| = 2 and by Lemma 4.2 we know IR] = |X| +1. Assume there exists j € N* such 
that 24 < IXl < 2477. So in each 7, tree there is some x, € X which labels a leaf at 
depth j +1. Now define d, € ID so that d,(i) =x, for alll ¢i < IT. Then 

lad I> (f+ 1) I 

Since flog IRI =j+1, 


BI 
we have sted) 1 > IT Plog IRI. I 
aed. Fa 


== 


In the following example we verify that if we allow overlapping access sets for 
IN| = 24) then we may have 

maxlea(d)| < IT | flog IRI. 

déID Bl 


Also, the bound in Theorem 4.13(a) is tight. 


Example 4.9. (a) In Example 4.8 we clearly have 
maxle(d)| = 8 < 3-Tlog,51 = ITI flog IRN, 
deD 3 "ia 
since p(d) only occupies cells in the set {0,1,2,3,4,5,6,7}. Note, however, that 
max|p(d)| = 8 > 3: (Tlog,57 - 1). 
d€ID : 
(1) In Example 4.7 we have 
maxlp(d)| = 3 < 2: Flog,31 = 4. 
d€ID 
However, since 
max|p(d)| = 3 > 2- (Tlog,31 - 1) = 2, 


d€ID 
the bound in Theorem 4.13(a) is best possible. i 


On the other hand, it is sometimes the case that we have overlapping access sets, 
Ix| = 24, and also 


maxle(d)i > ITI flog 1AM. 
d€ID ‘ al 


Example 4.10 illustrates this. 


Example 4.10. Let @ = {0,1}, X = {a,b,c,d}, and D = {a} UX% There exists (as 
the reader may verify) a a representation p:ID > gt such that the trees shown in 


Figure 4.6 implement y, and Yo, respectively. For instance, 


(a) = 1000 _ 
p(cd) = 101111 
p(ca) = 1010 


pbc) = 0.1110 
In this case 


maxle(d)| = 6 = (Tl Plog Iai. i 
d€ID Bl 


Let us now say something about the minimum size a representation can have. In 
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Figure 4.6. Trees for y, and y, of Example 4.10. 


particular, it is always the case that lp(d)| > |Jl - 1. Where there is no access set 


overlap, then it follows from Theorem 4.5 that |e(d)| > IT'l. 


Theorem 4.14. Let ID = UX! and tet piiD > By be some representation. 
i€J 
Assume that all y, € I" achieve Kraft access. 


(a) Then for alld € D: 
laa) 1 > IJ - 2. 
(b) If there are no overlapping access sets, then for all d € ID: 


lad) | > IT. 


Proof: (b) By Theorem 3.9, #ly,(p(d))]>1, and so if there is no access 


overlap, then Theorem 4.5 tells us that 


IT IT | 
lad)1> Daly (pla))1> D1 =I. 
i=4 i= 1 
(a) On the other hand, suppose we allow overlapping access sets. If 


la(d)| < 1J] - 1, then there are at most |J| - 2 root node labels. So for j € N* not 
all of the access trees 7, where j€J can have distinct root node labels. Pick 


Jisde € J, i, S$ jg, such that an and Yj, have the same root node label. Then by 


Theorem 4.8, X* ¢ ID for any Jy SK <jy But we know that yh ¢ ID, a 


contradiction. Therefore, |p(d)| > IJ| - 1. I 


a 162 
Example 4.11 shows us that the bound in Theorem 4.14 is best possible. 


Example 4.11. Let @ = {0,1}, X ={a,b}, and ID = Ux’', for J = {0,3,5,6}. 
i€J 


Consider the representation p:ID > B™ that corresponds to the set of access trees 


shown in Figure 4.7. Then all y, € T° achieve Kraft access, and 


IAA] = 3 =19J] - 1. | 
V4 Vz Vs 
(0) (0) (0) 
g 1) g (2) g vas 
a b a b a b 
V4 V5 Y6 
(4) (7) 
g gb oO g ©. 
a b a b a b 


Figure 4.7, Access trees for ¥,, Yo; Y3 of Example 4.11. 


Note that the bound in Theorem 4.14b may also apply to a table lookup question set 
that has overlapping access sets; recall Example 4.4. 

From Theorem 4.14 it immediately follows that if all , € IT’ achieve Kraft 
access and maxld| is unbounded, then infinite storage is required to represent each 


d € 1D. 


Corollary 4.14.1. Let piD > BT be any representation, and assume that all 

vy, €T° achieve Kraft access. If mele 5) Crna <k,), then for all 
d¢ 

d € ID, 7(3k, € IN) (Ip(d)1 < k,). 


Gee 
CHAPTER 5 
IMPLEMENTING THE TABLE LOOKUP QUESTION SET 


In Chapter 4 we discussed the sct I’ of table lookup questions and 
consequences of achieving Kraft access for each y,€IT. In this chapter we 
introduce three major classes of representation schemes and then examine the table 
lookup question set in the contexts of these three basic representations: fixed length, 
endmarker, and pointer. The fixed length representation was chosen because it 
sometimes allows us to achieve both Kraft storage and Kraft access. ‘he endmarker 
and pointer representations were chosen because they illustrate techniques commonly 
used for implementing variable length lists. Jn Chapter 6 we reconsider these 


representations in order to implement stacks. 
5.1 Classes of Representations 


In this section we briefly discuss some basic definitions and representation 
techniques and thereby motivate the formal definitions for fixed length, 
endmarker, and pointer representations, which are presented formally in sections 
5.2, 5.3, and 5.4, respectively. 


We begin with two notational definitions. 


Definition. Consider a function b € gt and recall that 
b= {(n, m(n)) [nn € D(d)}, 
where 6 Cm). For k € IN, we define 


{bo}, & {((n+k, m(n)) 1n € D(b)}. 


Thus, {b},, is the set b€ gt "displaced" by k, as illustrated in the following 


example. 


s-= 


Example 5.1. Consider a function f:S > {o,1}7 and lets, € S. If 
£05.) 1010) 9 69,105 15,0) 5.06,1) 3; 
then {f(s,)}5 = f(s,) 
and {f(s,)}_ = {(3,0), (5,1), (7,0), (8,1)}. i 


Also, we shall frequently have occasion to refer to the concatenation of two strings 


in G*. 


Definition. Let f, be a function f,:S + 6*, let f, be a function f2S > L*, 
and let s,, 5S, €S. We write f,(s,)-fp(s,) to denote the concatenation of the 
strings f,(s,) and f,(s,), where 


f,(s,)- fn(sg) & £,(s,) U {fals2) ir ¢s,) €. 


Thus, If (85) f(s.) = lf, (s,)1 + Ifn(s,)1, 

and DUS) False) SH 0ihy aang (Coy ede oCss) St), 

Notice that when f,(s,) =A, then If,(s,)1=0 and f,(s,)-f,(s,) =f,(s,)5 an 
particular, A» A = A. In an obvious way, the definition can be extended to the 


concatenation of any countable number of strings. 


Example 5.2. Define the function f:{a,b,c} > {0,1}* by 


f(a) =0 
f(b) = 10 
f(c) =11 
Then f(a)- f(b) = {(0,0)} U {(0,1), (1,0)}, 
= {(0,0), (1,1), (2,0)} = 010 
and fCe) fe) ={0;1) (1, DP 0400, 41,1) )5 
= {(0,1), (1,1), (2,1), (3,1)} = 1111. I 


Many commonly used representations schemes irivolve the concatenation of 
encodings of a set X. For instance, given a function f:X > G* it would seem 


natural to encode x,X,...X, € x* as f(x,)- f(x,)-... + fCx,). Similarly, we 


= Hise 


could encode X* by placing each of f(x,),..., f(x,) into a fixed field. We 


illustrate these schemes in Example 5.3a and 5.3b. 


Example 5.3. Let X = {a,b,c,d}, @ = {0,1}, and consider a function f:X > &* 
defined by 


f(a) = 00 
f(b) = 010 
f(c} =011 
f(d) = 10 
Assume that the domain is of the form ID = Uae and we want to dejine a 


i€J 
mapping from ID to gt 
(a) Consider the function f,ID > @*, where 
f,(d) = f(d(1))- f(a(2))-... - (add) ). 
For instance 
f,(abad) = f(a)- f(b)- f(a)+ fd) = 000100010 


f,(bdb) = f(b)- f(d)+ f(b) = 01010010 
f(a) =a 


ii 


Notice that, for [Jl > 1, f, is not a representation because there is no way to 
recognize the end of the string f,(d), eg., f,(b) and f,(ba) are indistinguishable 
since f(b) ¢ f,(ba). 


(b) Consider the function f,:ID > Bt, where : 
la| 
fold) = Ula o-1) 


Then 
f,{ abad) = 00.01000_10_ 
f,(bdb) = 01010 010 
f(A) =”. 
As in the case of f,, the function f, is a representation if and only if \jl = 1. j 


In the previous example, f, and f, would be representations, even for |J| > 1, if 
there were some way of detecting the ends of codewords f,(d) and fj(d). In 


particular, we might reserve some symbol to mark the end of the list or we might 
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give some specification of the length ld] or the length lp(d) |. 

Many representations that we consider are what we call 
concatenation-preserving, where the encoding of a list includes the encodings of the 
individual elements in the list. We now generalize the familiar notion of 
concatenation of encodings of list elements to not necessarily imply a “left to right” 
ordering, only that the encodings are in disjoint sects of memory cells. Thus, if we 
know where to look then it is possible to. determine d(i) and obtain no information 


about d(j), for 1 < i,j < ld. 


Definition. Let ID = Ux! and consider a function f:X > gt, Define the 
i€] 
function f7:1D > gt by 
la} 


f“(d) = VE, 

where n:ID > N. Then 7 is said to be a concatenation- preserving function if, 
for alli # J, 

DUA) bg (ay) 9 DID) 5 cay) = 2 
Let ¢ be any function g:iD > Bt and let f’ be the function defined above. 
Consider the function p:ID ~> 8" defined by the union 

pld) = {F7(d) }atcgy U {el d) }2cqys 

where n:ID 2 N, n#:ID > N. If pis, in fact, a representation and if 

DLE Ca) ) Jat egy) 1 Del) } 20g) = 2, 


then p is said to be a concatenation-preseruing representation. 


The condition that the domains of {1d ) 3, cay and (CdD) 35 cay not intersect 
guarantees that f% is, in fact, a concatenation of encodings of the list elements and 
that the representations of the list elements do not overlap. Notice that the function 
eg can be chosen in any way whatsoever, so long as the resulting union, f, is a 
representation. We now reconsider Example 5.3 and see that f, and fz are 


concatenation-preserving functions, 
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Example 5.4. The functions f, and f, from Example 5.3 are 
concatenation-preserving functions, since they fit the form of the above definition. 


(a) Given the function f:X > Q* as in Example 5.3, we can define f,:ID > Bt by 


la] 
f,(d) = Ule(a()}, cay» 
i=1 
i-1 
where n,(d) = DIf(d(4)) 
jFi 


Since n,,,(d) -n,(d) = If(d(i))I, it is clear that 
DULE(d())}, (gy) 9 DUD) }n cay) = 2. 
(b) Recall that we defined f,iD gt by 


la] 
f,(d) = Ula) oceay 
Since max D(f(x)) = 2 the domains do not intersect, and it is clear that f, is a 
xEN 
concatenation-preserving function. I 


Recalling Example 5.3, when ID = X* we know that ld] =k and f, and f, are, 
in fact, representations. When we wish to allow ID # tas however, then we may 
wish to consider ore of the following three representation schemes. 
(i) lf |p(d)| is of fixed size for all d € ID, then there is no need to specify 
le(d)|. Fixed length representations are discussed in detail in Section 
Sie: 

(ii) An endmarker representation reserves some symbol or set of symbols 
b € BT to indicate the end of the list {(d). A formal definition is given 
in Section $.3. . 

(iii) We can encode the length Id itself and use this as a pointer. A pointer 
representation is defined formally in Section 5.4. 

We illustrate endmarker and pointer representations in Example 5.5a and $.Sb, by 


extending the function f, from examples 5.3 and 5.4. 


a ee 


Example 5.5. (a) Recall from examples 5.3 and 5.4 the function f:X 7 Bt and the 
function f,:ID > 8, We can then define the representation p,:ID > Bt by 

pid) = (F(a) }q U feld) },2¢yy 
where g¢:ID > B° is defined by 


eld) 11 

d 

and where n2(d) = DIf(d d(j))I. 
Ja 


Since we already know that f, is a concatenation-preserving function, we need only 
note that 
D{e(d)},2¢4)) 9 DUE, (a) }) 


in order to verify that is, in fact, a concatenation-preserving representation, For 


uistance, 
p,(abad) = {f,(abad)}q U {g(abad)} 2 
= {f(a): f(b): f(a)- f(d) fg U {11}, = 00010001011 
p,(bdb) = f(b)- f(d): f(b) g(bdb) = 0101001011 
P,(c) = 011 
pla) =11 
lal 
Notice that \p,(d)l = DIe(d(j)) 1 + lela). 
jel 


Since g(d) and f(x) are distinguishable for all x € X, the string g(d) = 11 serves 
as an endmarker, allowing us to detect when the end of the list has been reached. 
However, since we also have, eg., p(c) = 011, not every occurrence of the string 11 
corresponds to the endmarker, It is necessary to somehow decode p(d) as we read 
it. 
(b) Recall the functions f:X¥ > Z* and f,:ID > BT from Example 5.3. Define the 
concatenation-preserving representation p,:ID > Bt by 

pad) = {f(d)}igiey UL9(d) Joy 
where @:ID > 8" is defined by 

e(d) = 140, 

Notice that ¢(d) corresponds to the length |d|. Thus, after reading g(d), we shall 


always be able to tell when we are at the end of the list representation fp(d). For 


instance, 
p,{abad) = 11110000100010 
p,(bdb) = 111001010010 
pp(r) = 0 
We shall later discuss more "efficient" pointer representations. I 


If in a concatenation-preserving function the functions n, are all constant 
functions (ie, the values of n, are not functions of the particular d being 
represented) then we say that the function has fixed position fields. Intuitively, 
this says that if we were to ask the question ,, for i < ld, then we would always 


kniow where in the representation to begin reading. 


Definition. Let ID = LU x! and let f be a function f:X 7 gt. Consider a 
i€J 


concatenation-preserving function f%:ID > B* defined by 
Id] 


/ = f : 
f (d) = SAUCER) 
where nl > N. If for alld, d, € ID and for all j, 1 <j < es i, 
i 
n(d,) =n(d,) 
then the function f% is said to have fixed position fields. We define an n, field 


to be the set 


U n(f(x)) +n, 
x€X 
for 1 <i < ld], where we use the notation 


(SayiSaumess Sh ke {Sytk, Sotk, ..., Sotk}. 


Clearly neither of the extensions of f, in Example 5.5 gives us a function with fixed 
position fields. The function f, from Example 5.3 is, however, a fixed position 
field function, since n,(d) = 3-(i-1) for all d € ID and thus each n, is a constant 
function. Since each n, field consists of all cells which may be occupied by 
pP(d(1)), the n, field for p of Example 5.3 is just {n,, n, +1, n, + 2}. Notice that it 
is not necessary that an n, field consist of contiguous memory cells, although for 


simplicity most of our examples will be of this form. In fact, it is possible for two 


are ic ee 


n, fields to “cross”; eg., we might have 
ky ky thy € U (D(f(x)) +n, 
and ky +k, ¢ U D(f(x)) + m4, 
for 1<&kyg<kz Example 5.6 gives an ie of a concatenation-preserving 
representation with fixed position fields, where a field does not consist of 


contiguous cells. 


Example 5.6. Let X = {a,b,c}, @= {0,1}, and ID = Ux’. Define the function 
i=z0 
fx Ufa} > Bt by 
f(a) =00 
f(b) =01 
f(c) = 10 
f(g) =11 
Consider the representation p:ID 8" defined by 
lal 
ad) = Ufrla(i))}, Uta, , 
i=1 i jd|+1 
where 
24= 3 for i even 
n= 
Qi ~ 2 for i odd 


Thus, d(1) occupies cells 0 and 2, d(2) occupies cells 1 and 3, d(3) occupies cells 4 
and 6, d(4) occupies cells 5 and 7, d(5) occupies cells 8 and 10, etc. For instance, 


pla) = 11 
plabaa) = 0001000011 
A(bacba) = 001010010101. 


So an n, field is not a set of contiguous cells. In fact, the Ng field is 


U n(t(x)) +ng = U d(t(x)) +4 = (4, 6} 
x€H x€N 
and the Ny field is 


U p(t(x)) +n, = U d(t(x)) +5 = {5, 7} 
X€EX x€X 
Notice that the n, field and the n, field "cross", since 


4,6 € U Dis( x)) +n, 


and o. € iv D(f(x)) +n, | 
xEX 


By definition it is, of course, not possible fogime miefinidecte SYN ASIA!) | ft 
In sections 5.3 auchS@(<oe)exighd themetjan, af a fixed position field function. 
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5.2 Fixed Size Representations 


In theorems 4.9 and 4.10 we showed that if a representation p:ID > ae 
achieves Kraft storage and also achieves Kraft access for all y € I’, then ID = X™ or 
ID = {a} UX" In this section we show that it is possible, where ID = X" or 
ID = {A} UX, to have Kraft storage and access with a fixed size representation. In 
fact, if the relative sizes of the problem and machine alphabets are chosen 
correctly, and if the domain 1s of one of the two appropriate forms, then there is 
always a fixed size representation which achieves Kraft storage and access (see 
Theorem $.8 and Corollary 5.5.1). 

Recalling Section $.1, a representation p is said to be of fixed size if it maps 


all strings in ID into strings of the same length. 


Definition. A representation p:ID > B is a fixed size representation function 
of size r if and only if 


(Yd € D)(le(d)| = 1) 


Notice that the definition makes no requirement that D( p(d)) = {0,1,..., ldl-1}, 
and in general p(d) might occupy any r cells of memory, not necessarily 
contiguous. Of course, we frequently consider a representation p:ID > 6’, where 
each d€ ID is mapped onto a sequence m= m(1)m(2)...m(r) = pld), for 
m(i) € 3 For any fixed size representation, however, it is known that each p(d) 
occupies exactly r cells, and so it is not necessary to store any additional 
information concerning the length of the representation. Let us look at two 


examples of fixed size representations. 


Example 5.7. Let ID = {a} UN U X4, X = {a,b}, and @ = {0,1}. Define the fixed 
size representation pg:ID > B® as follows: 


pla) = 000 
pla) = 001 
plb) = 010 


= bbe 


p(aa) = O11 
plab) = 100 
p(ba) = 101 
p(bb) = 110 


Since there is no d € ID such that p(d) = 111, p does not achieve Kraft storage. 
Also, it 1s not possible, using representation p, to implement any y,€ TI" so as to 
achieve Kraft access. (If y, did achieve Kraft access, then the tree for , would 
have three leaves and therefore two internal nodes. So one answer among a,b, 2 
would be determined in a single access, but by inspection we can see that this 


cannot happen.) I 


Example 5.8 illustrates a procedure for constructing a fixed size representation for 
which, if ID # X" and IXl = 1gl* - 1, we can attain Kraft access (although not 


Kraft storage). Notice that r= k+I'l, and we answer y, by first accessing cell 
(i-1)-k. 

n 
Example 5.8. Let ID = Ux", X = {a,b,c}, and G = {0,1}. Define the fixed size 
concatenation-preserving representation p:ID > 7 by 

lal 


ad) = Ufs(dli)) }oq. yu u tl aMea 


i= 
where f:XU{ a} > 8? is defined by 
f(a) = 00 
f(b) =O] 
f(c) =10 
f(g) =11 
In particular, for n = 2 we have 
pla) = Alll 
pla) = 0011 
p(b) = 0111 
pac) = 1011 
plaa) = 0000 
plab) = 0001 
plac) = 0010 
p(ba) = 0100 


p(bb) = 0101 


a a 


Albc) = 0110 
plca) = 1000 
a(cb) = 1001 
p(cc) = 1010 


Notice that [Gl = 2 and 27-1 =3=IXl. Sor = 2-2 and to answer 7, we first access 

cell 2-(i-1). Figure 5.1 illustrates access trees for y, and ‘y,, and it is clear that we 

achieve Kraft access. On the other hand, p does not have Kraft storage because 
213-94 = Bod. 

Intuitively,we would have achieved Kraft storage if we had altered the definition of 

p by letting p(A) = 11 _; this would have made p(ID) a complete code. Instead, 


we chose to specify values for m(2) and m(3) so we could always answer Y, in two 


accesses. This illustrates a trade-off between Kraft storage and Kraft access. i 
Y; Ve 
) Z) 
(1) 1) (3) (3) 
a b c a b c @ 


Figure S.1. Access trees for y, and y, of Example 5.8. 


Notice that when we define some fixed size representation p, we have not 
explicitly said anything about the elements in the problem domain ID. If, however, 
we meet Kraft storage, then we know by the following theorem that there are |G/" 


elements in the domain. 


Theorem 5.1. Let p:iD > B* be a fixed size representation of size r. 9 


achieves Kraft storage if and only if [ID] = Igi" 


Proof: Since |p(d)| = r for all d € ID, then 


5 igre”! 
d€ID 


= IDI" 


ee Ae 


and we have Kraft storage if and only if |ID|-/@l" =1; that is, if and only if 
(ID) = igi" I 


Notice that we could, of course, be representing any GI" strings in ID. 
We know by Theorem 3.10 that we cannot achieve Kraft access for |X| < Il. 


Unfortunately, even for |X| > Gl, the conditions IID] = II" and ID = UX! do not 
i€ 


guarantee that there is a fixed size representation that attains Kraft access. 


Example 5.9. Let [6] = 3, Xl = 4, and ID = {a} UX? UX®*® Then for r = 4, we 


424344241 = IDI, and a fixed size representation p:ID > B" is 


have [Gl = 3 
storage optimal. On the other hand, by theorems 4.9 and 4.10 we know that there 
is no representation, fixed size or otherwise, that achieves both Kraft storage and 


Kraft access for the table lookup question set T = {¥,, Ya) 3}: I 


In the last chapter, we have already shown that in order to possibly achieve 
Kraft storage and Kraft access, it must be the case that ID = X" or ID = {a} UX” 
If we wish a fixed size representation to have Kraft storage and access, then either 


iD = X™ or else we have the less interesting situation where ID = {a} UX’. 


Lemma 5.1. Let ID = {A} UX" and consider a fixed size representation 
pliD = Bt, of size r, which achieves Kraft storage. Assume also that each 
vy, €T achieves Kraft access. Then IG =2 and IT] =1; ie, ID = {A} U x! 
and |X| = 1. 


Proof: By Theorem 4.9 and Corollary 4.9.1, since ID + X¥™ then the only way we 
can achieve both Kraft storage and access is to have |G] = 2 and for there to be 
some Yi, V5 €T such that Y, and vj have overlapping sets. As a consequerice of 
Theorem 4.8, we know that ¥,, Yj; access some cell in common if and only if 
X* ¢ ID for all i<k <j. Since ID = {A} UX", then any pair of table lookup 


questions has overlapping access sets. Now lemmas 4.2 and 4.4 allow us to conclude 
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that each ¥, € I’ has the same root node label and each has a leaf labelled g@ at 
depth 1. But this says that [p(A)| =1, and so if p is a fixed size representation 
then it is a fixed size representation of length 1. ‘Thus IT] = 1. If vy, has a leaf 
labelicd a at depth greater than 1, then |p(A)] # lp(a)l and p could not be a fixed 
size representation, Thus y, has its only two leaves at depth one arid so we have 


the trivial case ID = {a} UX! and IXl = 1. I 


Of course, the above lemma simply says that if a fixed size representation achieves 
Kraft storage and access, then ID = {A} UX. The following example shows that it 
is, in fact, possible to have ID = {A} U X for a fixed size representation which does 


have Kratt storage and access. 


Example 5.10. Let 4 = {0,1}, X = {a,b,c}, and ID ={A}UX. Define the 


representation piiD > gt by 


Ala) = 01 
pla) = 0.0 
plb) = 10_ 
ple}= 1h. 
Clearly pis a storage optimal fixed size representation of size 2, and from Figure 
. we see that it is possible to implement Y, so that it has Kraft access. ; 
Yi 
O) 
0 ] 
(2) 1) 
a @ b C 


Figure 5.2, Access tree for y, of Example 5.10. 


Now from Lemma 5.1 and theorems 4.9 and 4.10 we obtain the following result. 
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Theorem 5.2. Consider a fixed size representation g:ID > Bt. Assume p 
achieves Kraft storage and each ¥, € I’ achieves Kraft access. ‘hen ID = X” 


or ID = {as uxX, 


In fact, achieving Kraft storage and access with a fixed size representation tells 


us something about the relative sizes of the problem and machine alphabets. 


Lemma 5.2. Consider a fixed size representation p:ID @* which achieves 
Kraft storage, and assume that all y, € I" achieve Kraft access. Then, for all 


7, ¢ I’, the access tree for y, has uniform depth. 


Proof: By Theorem 5.2, there are two cases to consider: 

(i) ID = {a} UX. In this case we know by Lemma 5.1 that IT] = 1 and |X| = 1, so 
Y, clearly has uniform depth and # is a fixed size representation. 

(ii) ID = Y*. Then by Theorem 4.10 there are no overlapping access sets. Assume 
there is some Y, whose access tree does not have uniform depth; in particular, let 
leaves labelled x,,x, € X be at different depths. Then there exist d,,d, € ID such 
that d,(n) =x, and 


d,(i) fori zn 


d (i) = 
Xp fori=n 
By Theorem 4.12, 
lad l= 2 aly,( pld,))1 + Hy ( pld,))I 
and an 
le(d,)| = 2 ily ( pldg))I + Hy ( p(dg)) 1. 
But 


aly ,( p(d,))] = alx ,] 4 Ax] = Aly, ( p(d,)) 1. 
Thus lp(d,)| # lp(d,)|, implying p is not a fixed size representation, a 


contradiction. So each , has a tree of uniform depth. I 
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This lemma allows us to prove that [XI = iglI* or IX] = IGIX - 1 if we are to attain 


Kraft storage and access with a fixed size representation. 


Theorem 5.3. Corisider a fixed size representation g:ID > Bt which achieves 
Kraft storage, and assume all y,€I° achieve Kraft access. If ID = X" then 


Igik = IX| for some k € IN, and if ID = {a} U X then Igik = Ix} +1. 


Proof: Let ID =X" By Lemma 5.2, we know that the access tree for vy, has 
uniform depth, say k, and so laiX = IR(y,(ID))I = |X|. Similarly, for 
ID = {a} UX, IR(y,CD))I = Ixl +1 = Iai% I 


The following example illustrates, however, that attaining Kraft storase and 
access, even where ID = X” and Ix] = lalk, does not necessarily mean our 


representation has fixed size. 


Example 5.11 Let 3 = {0,1}, X ={a,b,c,d}, and ID=X% Define the 
representation g:ID = B* as illustrated in Figure $3. More specifically, for 
X1)X_©X, we can let p(x,-x,) =f,(x,)-£,(x,), where the representation tree 
for f, has the same form as the access tree y,. For instance, 


plac) =0__10 
plad) =0 I) 
pbb) = 10.01 
Ade) = 11110. 
From the trees it is clear that y, and , achieve Kraft access. Also, since 


IAx ys xy)l = lf Cx b+ IfGx,)l = f(x dl + 2 


then the reader can verify that 


. ytd 


d€x? 
and so p achieves Kraft storage. I 


Pe ee ee era ee A en | 


On the other hand, the following example shows that we could have defined p in 


the above example to be a fixed size representation and still have attained Kraft 


v1 V2 


Figure 5.3. Access trees for y, and y, of Example 5.11. 


storave and access. In fact, there would always be such a fixed size representation. 


Example 5.12. Let G, X, and ID be the same as in Example 5.11 and define the 


representation p:ID 3 8° as illustrated in Figure $10, For instance, 


plac) = 0010 
plad) = 0011 
plbb) = 0101 
p(dce) = 1110 
v1 V2 
@ 
(1) (1) 
a b c d a b c d 


Figure 5.4. Access trees for y, and y, of Example 5.12. 


Clearly we achieve both Kraft access and Kraft storage. i 


To help motivate some further discussions, we first prove the following simple 


lernma. 
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Lemma 5.3. Let ID = X¥”. Then the following statements are equivalent. 

(1) There is some representation p:X > B* which attains Kraft storage. 

(2) There is some implementation for which each y,€ I achieves Kraft 
access. 


(3) There is some k € N such that IX] = k-(IG@l-1) +1. 


Proof: For D = x", R(y,(ID)) =X, There is some representation gp which 
attains Kraft storage for x € X if and only if there is a lGl-ary tree with |X| leaves 
if and only if IX] = k-(IGl-1) +1. Also, y, achieves Kraft access if and only if its 
iGl-ary tree has |X| leaves if and only if IX! = k-(IlZ] - 1) +1. I 


It is mot the case, however, that Kraft storage for a representation p:X¥" > gt 


implies Kraft storage for some representation p:X > at. 


Example 5.13. Let Gl = $ and |X| = 7. To get Kraft storage for X¥, we would need 
(lg -1) +1 =4i+1=7, which is not possible. But for x, i=12 gives us 


PUG 1) 4 1249 4b 49, I 


We are now ready to prove the main results of this section. The proof of the 


following Iemma is esseritially the same as the proof of Lemma 4.3. 


Lemma 54. Let X ={x,,%5,-.-,%,/ and D=X". Consider a 
representation = f:X a gt, If f achieves Kraft storage, then a 


concatenation -preserving representation g:ID > 8" defined by 
n 


pld) oa Ufr(ali)}, cays 
i=1 i 
where n:ID > IN, also achieves Kraft storage. 


Proof: By induction on n we prove that 


-Ip(d)| 

Sag Oe ey (5.1) 
d€x” 

Basis: For n =1, lp(d)| = If(d)| and so 


G4 sm 


-|pld -If(d -if(x)| 
SH Sa Se 
d€X d€X x€X 
Induction step: Assume that (5.1) holds for n. Then 
-lAd -L1f(d(i 
ig Ol area) 
dexn} dexrry 
pg (Ra + Ge 
dEX™K, 
>) Pee Col + If(xQ)) 
déX™ x, ; 
ee pg he + OGD 
. dEX™ x, 
f(x) Sie(a(i))| 
a a Ke 
déx” P 
-lf(x,)| 216 (d(i)) | 
gg h>s ig HM) 
d€éx” . 
-lf -2\f(d(i)) | 
er f(x) 5 igi 2 ( (i)) 
d€x" 
By our inductive hypothesis this then gives us 
k ; . 
-lpd -lf(x,) | - | 
I a a kes i 
dexne} ivt x€X 


Theorem 5.4. Let ID=xX% If there exists some k€N = for which 
IN| = k-(1@] - 1) + 1, then there is an implementation (Q, ») solving (T, D) 
such that p:iD > B* achieves Kraft storage and each y,€I° achieves Kraft 


access, 


Proof: Since IX| = k-(I@] - 1) +1, we know by Lemma 3.1 that there is a |Zl-ary 
tree I’ with IX] leaves and node labels chosen from the set {0,1,...r}, for r < k-l. 
We can use this tree T to define the storage optimal representation f:X > G*. Now 


define the concatenation-preserving representation g:ID > gt by 


pd) - Utd) Seseayaeay 
i= 


Gh x 


By Lemma 5.4, since f achieves Kraft storage so does p, Also, if we implement 
vy, © T by the same tree T except replacing node label j by label j + (r+1)-(i-1), 


then each y, € I achieves Kraft access, ] 


From Lemma 5.3, Theorem 5.4 holds if instead of the condition IX] = i-(I@l - 1) +1 
we have the condition that there be some representation p:X > Bt which attains 
Kraft storage or that there be some implementation for v,; which achieves Kraft 
access. ‘Trivially, the above theorem also holds for ID = {A} UX, when 


IX} +1=i-(16l- 1) +1. 


Corollary 5.4.1. Let ID = {a} UX. If there exists some k € N* for which 
IX] +1 =k-(]@l-1) +1, then there is an implementation (GQ, p) solving 
(T’, ID) such that g:ID - Bt achieves Kraft storage and y, € T° achieves Kraft 


access, 


We present an example to illustrate how g and f in the proof of Theorem $.4 might 


be chosen. 


Example 5.14. Let G@= {0,1}, X = {a,b,c,d,e}, and ID =x" Then 
IX] = 4-(1Gl - 1) + 1 is satisfied by 1 = 4, and there is a binary tree with five leaves 
and four internal nodes whose labels are in {0,1,2,3}. In fact, there are many such 
trees, and we (arbitrarily) pick T to be the tree shown in Figure 5.$a. Using IT, we 


define the representation f:X¥ > gt by 


f(a) = 00__ 
f(b) = OL _ 
fle 300 
f(d) = 101 
f(e) =11_ 


By inspection, f attains Kraft storage. We define the concatenation-preserving 
representation p:ID > Br by 
pld) = f(d(1))- f(d(2)). 
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Figure 5.5, Trees for T and y, of Example 5.14. 


So T is also the access tree for Y,, and the access tree for Y, is the same as T but 
has each node label j replaced by the label 2i+ j. The tree for y, is illustrated in 
Figure $.5b. Then we have, for instance, p(bc) = 01__100. The representation ¢ 
achieves Kraft storage because 


-|p(d)| : : : 
ye Soe ng PO a. 
d€x* 
By inspection of the trees for Y, and ¥,, we also attain Kraft access. | 


Notice, however, that the representation p in Theorem 5.4 has many "gaps" in 
it. Even if we had constructed the tree T so that each node at depth j had label j, 
we would still have had gaps, unless I were of uniform depth. If we require that p 
be located in consecutive cells, then we cannot obtain Kraft access unless for all 
dijd,€X", lad )l=lpld,)l; ie, Aly (pld))] = Hy (p(d))], for all 
Vp; € I. We now show that if in Theorem 5.4 it had also been the case that 
IX] = la\*, then there would have been an implementation achieving Kraft storage 


and access with a fixed size representation and without any “gaps”. 


Theorem 5.5. Let ID = X" and |X] = Igl* for n,k €N*. Then there is an 
implementation (Q, ) solving (I, ID) such that p:ID > Q* is a fixed size 


representation achieving Kraft storage, and each ¥, € T’ achieves Kraft access. 


ey ee 


Proof: Since we are given that |X| = Gl", the equation IX] = i-(i@l - 1) +1 is 
k-1 


satisfied for i= 2 1G. Theorem 5.4 immediately tells us that there is some 
representation sina Kea storage and. access, but we want to show that there is, in 
fact, such a fixed size representation. As in the proof of Theorem 5.4, we define 
the concatenation-preserving representation pD > 2™* by 

p(d) = f(d(1))- f(a(2))-... + £Cd(n)) 
where f:X¥ > 3% corresponds to a tree T of uniform depth k where each internal 


node at depth j has label j. Certainly f and therefore p both achieve Kraft storage, 


as verified by 


-lp(d)| . re z 
Sa eS arte a 1xP igi ® aly i 1X? = 1. 
dD d€x" 
Also, we implement y,, € I’ by the same tree ‘I’, with labels j replaced by mk + J. 


Each ,, € PT achieves Kraft access, since Aly ,( p(d))J =k and 


sats : ; 
Sig = 3 elk ext lark a1. 1 
r€x r€x 


We can give an example, similar to Example 5.12, which illustrates this theorem. 
Example 5.15. Let @ = {0,1}, X = fa,b,c,d}, and D = X% Notice that |X| = |GI?, 
and Figure S.6a shows a tree IT of uniform depth two corresponding to the 


representation f:¥ > 8%, Then we define the representation piD > B* by 


Pld) = f(d(1))- f(d(2)). For instance, 


plac) = 0010 
plad) = 0011 
pbb) = 0101 
pldc) = 1110 


The tree IT’ of Figure 5.6a is the access tree for y,, and the access tree for Y2 is 


shown in Figure $.6b. I 


Analogous to Theorem 5.4, we have the following corollary. 
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(a) : (b) 2 


Figure 5.6. Trees for T and y, of Example 5.15. 


Corollary 5.5.1. Let ID = {a} UX and IX] +1 = IG\* for some k € IN*. Then 
there is an implementation (Q, p) solving (I, 1D) such that pID > G* is a 
fixed size representation achieving Kraft storage, and each ¥, € T’ achieves 


Kraft access. 


What we have proved in this section is a weak equivalence between the 
requirements that ID = X" (or ID = {a} UX) and that there be some fixed size 
implementation in which we achieve Kraft storage and access. More precisely, 
Theorem 5.2 told us that if there is a fixed size representation g:ID > Bt which 
achieves Kraft storage and for which each 7.6 T’ achieves Kraft access, then 
ID = X" or ID = {A} UX. Conversely, Theorem 5.5 and Corollary 5.5.1 essentially 
tell us that if ID = ¥" or ID = {A} UX, then there is some fixed size representation 
which achieves Kraft storage and access. The condition IX] =1Zi* (or 
IX| + 1 = Igl*) was put in to avoid “rounding errors". If we do not have |X| = 1Z\* 
for ID = X™, then either we do not have Kraft storage or else our tree must have 
leaves at (at least) two depths, j and j+il. This would cause 
nj < lp(d)l <n(jy+1) and so p would not be of exactly fixed size. Or else we 
could let p:ID = GY") be fixed size and then we would not quite attain Kraft 


storage. Thus, theorems 5.2 and 5.5 allow us to prove the following result. 
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9.3. Endmarker Representations 


Recall from Section S.1 that an endmarker representation has some fixed 
symbol or sequence of symbols in Bt which are always at the “end" of the list. 
Example 5.$a is an example of an endmarker representation. ‘he representation p 
in Example 2.7 is also an endmarker representation, with endmarker 0. We now 


give a formal definition. 


Definition. Let f be a total function f:ID > g", and let 0€ * (0 #2). 
For each d € ID, let n(d) €N such that n(d) > max D(f(d)). Then a 
representation p:iD > 8" which is defined by 

Ald) = fd) U {9} ea) 
is an endmarker representation. The relation © is known as the endmarker, 


and the function f is the dist component of p. 
To illustrate what this definition says, we present the following example. 


Example 5.16. Let X = {a,b,c}, @ = {0,1}, and ID = Ux" Define the function 


i€J 

f:XUl a} > Bt by 

f(a) =00 

f(b) = 10. 

f(c) =1L. 

f(g) =01 
If we then define f7:ID > gt by 

Id 


f(a) = Ulead aceays 
i=4 
then the representation 
pld) = fd) U C2) diay 
co 


iS a concatenation-preserving endmarker representation. For ID = Ux’, it iS easy 
: {=O 


to verify that p achieves Kraft storage, since 


Ip(d)| = ald] + 2. 


=10b= 


Thus, 
S ola 2 S > 9 (alal + 2) 
d€iD i€J gex! 
= Sle ete) 
i=0 
et Sys) 
4m 4 
cae oe 


Note that no finite 1J] will give us Kraft storage. 

Now consider answering a table lookup question 7, & I’. For Y, we need only 
access mi(0) and m(1), or else m(0) and m(2), On the other hand, to answer the 
question ¥,, accessing just m(3) and m(4) (or else m(3) and m(S)) may not give 
the correct answer. In particular, unless we have already determined that the 
answer is g, then we must verify that ld| > 1. This requres accessing m(0) and 
possibly m(2). Possible access trees ‘I’; for each y, can be constructed as indicated 
in Figure 5.7, where we write {T,},, to denote the tree T, with each node label j 
replaced by the label j +k. These trees correspond to reading the necessary 
memory cells in a left to right order. It would also be possible to read the cells 


essentially from right to left. For either method, once f(g) is encountered for ¥,, 


then it is known that ld! < i. I 
Yi Ty Viet Vie 
(0) (0) 
(2 0. (2) BAe 
a 5) b c an 
{Ty}, 


Figure 5.7. Trees for y, of Example 5.16. 
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The endmarker representations we have thus far seen are all 
concatenation-preserving representations, but there is no such requirement in the 
definition. In fact, there is not even any requirement that the endmarker be 
necessary; ie., for an endmarker representation p(d) = f(d) U {0} 30a) it may be 
the case that f(ID) itself is a representation and thus the endmarker © 1s 
superfluous. Also, there is no restriction that the endmarker not appear in f(d). 
Even if the pattern 0 € 87 does not appear in f(d), there may be "holes" in f(d), 
which allow the possibility of another user writing 0. Thus, it may not be the case 


that the first occurrence of 0 serves as the endmarker. 


Example 5.17, Let @= {0,1} and, for D ={d,,d,,d5,d4,¢,}, define the 
function f:ID 3 gt by 


f(d,) =0.0 
f(d,) =01 
f(d,) = 10_ 
f(d,) =_10 
f(d,) = 1 


If we let 0 = {(0,1), (1,0)}, we can then define the endmarker representation 


pID = gt by 


A(d,) = 0.010 
Ad,) = 0110 
A(d,) = 1010 

pld,) =_1010 


Ald.) = 1110 
The endmarker here is not superfluous because it does enable us to distinguish 
between p(d,) and p(d,) and between p(d,) and p(d,). On the other hand, 
even if we were to eliminate d, from the domain, p would still be an endmarker 
representation. Notice also that p(d,) =0010, and thus if another user sets 
m(1) = 1 then the actual endmarker is not the first occurrence of 6. In fact, f(d.) 


itself contains the set 0. H 
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We usually have in mind a more restricted notion of an endmarker 
representation, where we require 0 to be distinguishable and reserve 9 solely to 
indicate the end of the list. (Of course, if the representation has holes in it, then it 
is still possible for other users to write 0.) Thus, if we read a list representation 
from left to right and access no cells not in the representation, then encountering 0 
immediately tells us when we've reached the end. Most of our examples will be of 
this form. 

Notice that the function f% in Example 5.16 has fixed position fields. 
However, the endmarker in the representation p has a displacement furiction 
n(d) = 3ld|. So n is not a constant function, and the endmarker is not always in 
the same memory position. In fact, if the endmarker were always at the sarne 
location, then there would be no point in having an endmarker at all; there is no 
such concatenation-preserving endmarker representation. We make the following 


definition. 


Definition. Let pilD > Bt be a concatenation-preserving endmarker 
representation, with endmarker 0, and formed from a 
concatenation-preserving function f” with fixed position fields n, If p is of 
the form 


ld 


ad) = Uft(a(i))}, u {0}, 
j=1 1 la]+1 


then p is said to be a fixed position field endmarker representation. 


Thus, the representation p in Example 5.16 is a fixed position field endmarker 
representation. 


In Example 5.16 we saw an endmarker representation that achieves Kraft 
oO 


storage when ID = Ux! We can, in fact, show that achieving Kraft storage implies 
i=0 
that max|p(d)| is unbounded. 
'  dé€ID 
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Theorem 5.7. If an endmarker representation g:ID > Bt achieves Kraft 


storage, then (an € N)(¥d € ID) (ip(d)1 <n). 


Proof: Assumethat (an €N)(¥d € D)(lp(d)1 <n). 
Then it is possible to choose d, € ID such that 

max D( p(d,)) = max D(p(d)). 

d€iD 
Thus, no p(d) occupies a larger memory cell location than Ad,,). By the 
definition of an endmarker representation, there is some function f such that 
Ady) = f(dy) U (hcg, y 
Now let r = min D(0), and choose by € @ such that by is not a prefix of 0. (Since 
I4| 2 2, there must always be such a b,,) Consider the string 
b= f(d,) U{(nld,) +r, by} € Bt 

For all d, € ID, b and p(d,) are distinguishable. In other words, there is no d,; € D 
such that 6 © ,(d;). So by Theorem 3.3 p does not achieve Kraft storage. Thus, 
our original assumption must have been wrong, and we conclude that 


(dn €N)(¥d € (ID) (lp(d)1 <n). 1 


It immediately follows that if an endmarker representation achieves Kraft storage, 


then the domain ID must be infinite and also that the index set J must be infinite. 


Corollary 5.7.1. If an endmarker representation p:ID > gt achieves Kraft 


storage, then —(3n € IN) (IDI <n). 


Corollary 5.7.2. If an endmarker representation g:ID > Bt achieves Kraft 


storage, then (Jn € pee ign). 
i¢ 


Thus, when we are discussing endmarker representations, we frequently consider 
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Notice that since achieving Kraft storage tells us that the domain must be 
infinite, we immediately know that no endmarker representation can achieve both 


Kraft storage and Kraft access. 


Theorem 5.8. ‘There is no endmarker representation that achieves Kraft 


storage and also achieves Kraft access for all y,€ E. 


Proof: From theorems 4.9 and 4.10, we know that if a representation p achieves 
Kraft storage and Kraft access for all y,€ TI, then ID = X" or ID = {a} UX”. But 
by Corollary 5.7.1 we know that [iID| cannot be finite for an endmarker 
representation that achieves Kraft storage. hus, there is no endmarker 


representation that achieves both Kraft storave and Kraft access. 5 


Recall again the representation g in Example $.16, which achieved Kraft 
storage, We can show that this result generalizes. In particular, given any 
representation fXUfe}> 87 — which achieves Kraft storage, a 
concatenation-preserving endmarker representation p formed from f also achieves 
Kraft storage. Before we prove this, however, we introduce some terminology and 


prove a lemma. We begin with the following definition. 


Definition. Consider a full IGl-ary tree T’ with I@I® nodes at depth k, for 
all k © IN. Assume that some of the (irternal) nodes are labelled @ but that 
T’ has the property that if a node is labelled @ then none of the desceridants 
of that node is labelled. We use the term g-node to refer to a node labelled ¢@ 
or the descendant of a node labelled g. We then let & denote the fraction of 


the nodes in 1% at depth k that are @-nodes. 


Since € is a fraction of nodes that are g-nodes, it is clear that 0 < E(k) <1. Also 
E(k +1) > E(k), since a g-node at some level leads to the same fraction of 


g-node descendants at the next level. The following example should clarify what is 


meant by a @-node and by &(k). 


Example 5.18. Consider the tree I'’ in Figure 5.8. For simplicity we have deleted 
the node labels indicating memory cell locations. We have, however, retained the 
external label g@ on certain nodes and marked cach g-node with an "x". Notice 
that all descendants of nodes labelled @ are themselves g@-nodes. ‘There are 1 
g-node at depth 2, 3 g-nodes at depth 3, 8 g-nodes at depth 4, 19 w-nodes at 
depth 5, etc. Thus 

&(0) = &(1) =0 

k(2) = 4 


8 
2 1 
£(4) = &(3) tae ey 


2 
7 oe odo 
(5) = E(4) a Say 
We shall have occasion to refer back to this tree T’ in a later example. I 
¢ 
g 
g g 
g g . 


Figure 5.8. Tree T’ from Example 5.18. 


In order to motivate some of the terminology used in the next lemma, let us 


consider another example. 
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Example 5.19. Let X = {a,b}, @= {0,1}, and consider the representation 
f:X¥ Uf ga} > B* defined by 


f(a) =1 
f(b) =00 
f(g) =01 
Then f achieves Kraft storage and corresponds to the tree I, shown in Figure 5.9a, 


foe) 


Now, for ID = Le define a concatenation-preserving endmarker representation 


i=0 
piID = B* by 
la| 
Ald) = Ufe(d i}, cay U £00,0), C1 dacay 
jet i 
i-t laj 
where nj(d) = D|f(d(j))| and n(d) = 2lf(d(j))l Then we can construct a tree 
jai jel 
T for representation p as in Figure 5.9b, H 


We can now prove the following lemma. 


Lemma 5.5. Consider a prefix representation f:XU{g} 7 6* which achieves 
oO 


Kraft storage. Let ID = Ux! and consider a concatenation -preserving 
i=0 


endmarker representation g:ID > gt defined by 
ld| 


=] . 
where nD > IN, niiD > IN. Let T be a |Zl-ary tree corresponding to p, and 
let 1” be an extension of T which keeps the g-node labels of T but extends 
the tree so that I” has IGI* nodes at depth k, for all k € IN, and the @ labels 


riow label internal nodes. Then 


lim &(k) = 1. 


ko 


Proof: Since the prefix representation f achieves Kraft storage, there is a 


corresponding full |Gl-ary tree T,, as shown in Figure 59a. Assume that 


lf(a)l =r and that max If(x)l =p. Then 7, has (maximum) depth p and 
x€XU{ go} 
the depth of its g-node is r, The tree T corresponding to p is formed from T, by 
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(a) oT, 
© 
0 a 
b gw 
(bo) oT 
© 
o o 
@ ; a 2) 
@ Q © : 36 © 


ba ab aaa 


Figure 5.9. Trees T, and T from Example 5.19, 


placing a copy of ‘I, at each leaf not labelled g@ and doing this indefinitely. (The 
memory cells to be accessed need to be altered according to the values of nd) arid 
n(d). Since we know, however, that no path will contain the same memory 
location twice, we choose to ignore these access labels and are concerned only with 
the external labels at a #-node indicating the d such that p(d) leads to this node.) 
T’ is the extension of ‘T where we keep the g-node leaf labels but extend from 
each of these leaves a full [Zl-ary tree. Thus, for all k €IN, T’ has IZI* nodes at 
depth k. . 

We are now ready to determine im E(k). It is clear that &(i+1) > E(i), 
because if there are j @-nodes at depth chen there are |GZl-j descendants of these 


@-nodes at depth i+ 1. Thus, the fraction of these g@-nodes cannot decrease. 
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Also, there may be more @-nodes at depth i+1, corresponding to copies of TI’, 
with leaves labelled @ at depth i+ 1. Each node at depth i which is not a g-node 
will have a descendant within depth p which is a g-node. Thus, at least |Z 
descendants of non g-nodes at depth i will themselves be @-nodes at depth i+ p. 
Since the fraction of non g-nodes at depth i is 1 - &(i), 


E(i+p) > Ci) + IAP PCL - ECi)) 


1 , ieP-1 (i 
j i) 
igi? igi? 
If we look at the values of E(k) at depths 0, p, 2:p, ec., we find that 
k-1 


iP l j 
ieee een Ee’) 
kp) 2 ap 2 ap 
iP _ yk 
= 1 ! (Ja ) 
iglP 
Of course we know that &(k-p) <1, and so we conclude that 
lim &(k) =1. | 


ke 


‘To iflustrate the method used in proving Lemma 5.5, we refer back to Example 5.19. 


Example 5.20. Recall the representation p from Example 5.19. The extension of T 
to a tree T'’ with |Z|* nodes at depth k is the tree ‘Il’ of Example 5.18, shown in 
Figure 5.8, The tree T,, from which T and I’ were constructed, has maximum 
depth 2, and the depth of its g-node is 2. We want to verify that 
Ei +2) > EC) ++ (1 - Ci). 

The fraction of non g-nodes at depth i is 1 - (i). Every non g-node at depth i 
serves either as a root of another copy of I’, (see node A in Figure S.10a) or else is 
an internal node of some T, copy (see node B of Figure 5.10b). In the former case, 
we get a new #-node at depth it 2. In the latter case, we get a new g-node at 


depth i+ 1, which gives us two additional g-nodes at depth i + 2. I 


Lemma 5.8 allows us to prove the following result. 
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(a) (b) 
Gh root of I, 


root of || t 


Figure $.10. Origination of new g-nodes at depth i + 2. 


oo 


Theorem 5.9, Let ID = Ux!) and consider a representation f:XU{g} > 6* 
i=0 
which achieves Kraft storage. Assume that the set {(X U {g}) forms a prefix 


code, and let pilD > gt be a concatenation-preserving endmarker 
representation defined by | 
ld 
pld) = Ufe(di)}, (ay ULF) aces 
=41 1 


i 
where n,:ID > N, n:ID > IN. Then p achieves Kraft storage. 


Proof: Let W(i) be the distribution function 
Wi) & I{pld) | lp(d)| = ipl 
So W(i) corresponds to the number of g-nodes at depth i that have no g-node 


ancestors. Then 


> ig = 2W(i): lal! 
déiD i=0 


k 
= lim 2(i)- lat 
ko 1-0 
k 
= lim Ial* 2y(i)- lale" 
k 00 i=0 
A g-node at depth i is an ancestor of IgI*"! descendants at depth k, and so there 
k 
are 2W(i)- IGIK! g-nodes at depth k. Since at depth k there are a total of I@I* 
_ 10 
nodes, the fraction of nodes at depth k which are g-nodes is 


k 
igl"* > wi): ilk? = E(k). 
1=0 
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Applying Lemma 5.5 gives the desired result: 


Sa eo eee 16a ges 1 
dD 


ko 


The above theorem still holds if we do not require that f(X U {g}) be a prefix 


code, 


Theorem 5.10. Consider a representation f:XU{g} > B* which achieves 
Lo] 


Kraft storage. For ID = eae let p:iID > Bt be a concatenation -preserving 
i=0 
endmarker representation, where 
la| 
= Ulta Min ca UMA) daca» 


forn ID > IN, niD oN, Then p achieves Kraft storage. 


Proof: Consider any representation f,:XU{ 2} > B and recall from Chapter 3 the 


statement of the Kraft inequality. If f, achieves Kraft storage, then 
“If 0x) 
Ia) to el 
x€X Uf 2} 
and the Kraft inequality is satisfied (with equality). Thus, there is some function 
fa:XU{ 2} > Bt such that f,(X U{g}) is a prefix code and If,(x)i = If,(x)I for 
all x € X U{g}. By Theorem 5.9 we know that for any concatenation-preserving 


representation formed from f 
i 2) 


-lp,(d)| -(2lf4(d(i))1 + fC) 1) 
So ae sao 
déID d¢éID 
la ld 
Since lp(d)| = DIF, (a(i))1 + + lf(g)l = Py (d(i)) | + IfC 2) 
i=t 
we can conclude that ‘a e)| 
-lpld 
Sg es 
d€ID 
and so p achieves Kraft storage. I 


We can verify directly that the representation p from examples 5.19 and 5.2 


achieves Kraft storage. 


=e 


Example 5.21. What we want to show is that 


d¢ID i=0 
Referring to the tree T’ of Figure 5.9b, we see that 
y(0) = W(1) =0 
¥(2) = 
¥(3) = 
¥(4) = 
W(S) = 


In fact, whenever a copy of ‘I, terminates at depth i, then there is a leaf from T, at 
depth i-l which serves as the root of another copy of T,, one which has a g-leaf 
at depth i +1. Similarly, if a copy of T, terminates at depth i - 1, then there is a 
leaf of I’, also at depth i - 1 which serves as the root of a copy of T,, leading to a 


@-leaf at depth i+1. Thus, we can define the distribution function y by 


w(1) =0 
(2) =1 
W(i+1) = Wi) + WGi - 1) 


Solving this Fibonacci expression, we find that 
x i = 


fori > 1. Thus, we can arecuy ae that p achieves oe storage. 
co 


“378. (Le vsyi 5+ v5. ane ee 973 


: “i 
eNOS eg a) a 
p= V9 L+v5)i | 54+ V5 1- v5,i 
arene | as ee ) + = pe 5 4 ) 


_S-v5 1l+v$ , $+v5 1-v5 

ee 3- V5 {0 3+ V5 

=, i 
As an aside for interested number theorists, notice that the sum in Example 5.21 


holds for W(i) any extended Fibonacci sequence. 


Corollary 5.10.1. Let fib, (i) # fib, (i-1) + fib,(i-2) +... +fib,(i-n). 


Then 2 fib, (i) 97 21, 
iz0 
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Proof: Consider a binary tree ‘I’, of the form shown in Figure 5.Ja, which has 
internal node labels 0,1,...,n-l (for 0 <i <n, there is one node at depth 1, 
arid that riode has label i) and has one leaf at each of the depths 1, 2, 3,..., n-1 
and two leaves at depth n. Consider the extension ‘I! of T,, as in Figure 5.9b. If a 
copy of T, has a g-node at depth i -k, for 1 <k<n, then that copy of T, has its 
root at depth i - n - k and thus has a node at depth i -n which is not a @-node. 
This noce, not itself a g-node, must serve as the root of yet another copy of T, 
and this new copy of I, has a g-leaf at depth i. Thus 
WCi) = WCi-l) + WCi-2) +... + WCi-n). 
But by Theorem 5.10 we know that the extension T of ‘I’, corresponds to a 


representation p which achieves Kraft storage. Thus 


Dwi) 2! =1= Zfib,(i)- 274 i 
i=0 {=0 
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5.4 Pointer Representations 


Recall from Section 5.1 that a pointer representation has some function 
0:3 + BT which serves as a pointer and indicates the length Id]. Example 5.Sb 


gave an example of a pointer representation, and we now give a formal definition. 


Definition. Let ID = Ux, let f be a total function f:ID > gt, anid let 
i€ 
Q:J > Bt bea representation. Then a representation: g:ID > gt which is 
defined by 
Pld) = {bn (ay ULE Sn cay 
iS a pointer representation if 
DUA}, cay) NDC I, (ay) = @ 

where n,, n, are functions, n,ID >IN, nD > IN. The function f is the list 


component of p and & is the pointer component of p. We refer to &(ld|) as the 


pointer of p(d). 


Note that the functions ny, Nz in the above definition are not the same functions as 
the n, in the definition of a concatenation-preserving function. Before discussing 
the pointer representation in more detail, let us present the following example in 


* order to illustrate the definition. 


Example 5.22. Let X = {a,b,c,d}, 6 = {0,1}, and ID = U X', Define the function 


i€J 
f:X 9 BT by 
fVa)-=0.0 
f(b) = 10_ 
i(e)-= 41. 
f(d) =O1 
and then define the concatenation-preserving function f*:ID > gt by 
Id] 


£(d) = Ulta ag ery 
{=1 


The pointer @:J > 8" is defined by 


eo 


Bi) = 1'0. 
Then the representation g:ID > Gt where 
pla) = (Aldi }g U LEC) gies 
iS a pointer representation. For instance, 
pAlabad) = 1111000100001 


A(bdb) = 1110100 110_ 
pla) =0. 


Notice that 


lo(d) | = Id} +1 + If(d)| = Bld} + 1. 


For J =\N, then iD = Ux! and it is easy to verify that p achieves Kraft storage: 
i=0 


-lo(d)] 
> 9 |p )I s S > g7(3i + 1) 
déID i€J gex! 


be) 


= Skiers 
isd 


— 
. 


Now consider answering the table lookup question , € I’. ‘he answer to the 
question ‘y, is essentially found at memory locations beginning with cell 3-(i-1), 
except that we have stored the pointer in front of f(d), and so f(d) has been 
disnlaced hy |d/] + 1 cells. This, the answer to 4 AO TS l/|, is found by reading 
m(ld|+1+3(i-1)) and then reading either m(\d[+1+3(i-1)+1) or m(ldl+1+3(i-1) +2). 
When i > ldl, we need only read the pointer to determine that the answer is Z. 
One possible algorithm to answer the question , therefore has the memory cell 
access sequences: 

O,1,..., Idl-l, ldl, ldl+3Ci-1) 41, ld+3(i-1) +2 if m(\d|+3i-2) =1, ld] >i 
0, 1,..., Idl-1, dl, ldl+3(i-1) +1, Idl+3(i-1) +3 if m(|d|+3i-2) = 0, ld] >i 
G,1,..., ldl-l, ld if ld| <i 


This immediately tells us the total number of accesses made: 
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ldl+3 if ld| >i 
ACY ( p(d))I = 
ldl+1 if kdl <i 


The intuition behind the definition of a pointer representation is that we 
encode the length so that in order to answer a question y, we need only read the 
pointer and can then look up the answer. In the case of the endmarker 
representation, we were forced to actually read the list. ‘The question remains, 
hawever, why we chose to allow the pointer to encode Id| rather than l(a) |. If we 
wish to be able to access individual list elements, as by asking the questions in T, 
then it is reasonable to encode Idi Reading the pointer will then at least tell us 
imimeciiately whether the answer to 7, iS @ or not. On the other hand, if we wish 
to perform the update operation of appending an element to the end of the list, 
then it would be advantageous to know Lp(a) | If 

(Vado € D)(ldyl = ida > iCal = ee) 
then it of course makes no difference whether the pointer encodes Id] or If(u)], 


since we can determine one from the other. 


Example 5.23. Reconsider the functions f and g from Example 5.22 but define a 


partial function @7N 2 3t  such that D(2’) = (2i1i€ J}, the even natural 
n 


numbers, and @’(n) = 129, Notice that 2% is a representation. So the 
representation p7:ID = 8° defined by 

p(d) = {eCitld) 39 U LR d) diaies 
is equivalent to the representation g in Example 5.22 because 

a’(le(a) 1) = e(idl). 

Technically, however, the representation p’ is not a pointer representation because 
est li€é J} gt, whereas the definition requires that &:J 2 gt. But since 
Io( @”) = IDC @)| = IJl, we often find it convenient to loosely refer to p’ as a 


pointer representation itself, I 


= 1s 


We could have written the formal definition of a pointer to allow a mapping 
a’:N + 37 where ID(e’)| = Jl, but we chose not to since the added generality 
would make the definition statement more complex and would not improve our 
results, 

We shall, however, allow one conceptual extension for pointer representations. 
Since we require the pointer and list components, £(Id1) and f(d), to be placed in 
memory so as to riot overlap, we may want to view them as being stored in separate 
sections of memory. In other words, we could view f(d) as being stored in memory 
as usual and €(|d|) as being stored in an auxiliary section of memory, perhaps 
some sort of register. However, we shall not in general want to bound the size of 
the pointer and we do not differentiate between the cost of a pointer access vs. the 
cost of a list access, so it is easier to view the pointer as also being in memory. We 
simply assume the memory manager allocates the list and the pointer separate areas. 
Perhaps they are even interspersed, but we do not want to have to alter our coding 
schemes to take this into account. Therefore for numbering simplicity we may 
choose to allow both the list and the pointer to begin at cell number 0 and just note 
that the representations are separate and therefore disjoint. In this way, the storage 


of f(a) in memory does not have to depend on the memory location of (|d!). 


Definition. Let f be a total function f:ID > B and let @ be a representation 
0:3 2 BY. Assume that f(d) and €(ld!) are stored in separate sections of 
memory. Then we refer to a pointer representation g:ID > B* formed from 
and f as a Separate pointer representation and write 


p(d) = f(d) U E(ld/) 


In order to avoid possible confusion, when we have in mind a separate pointer 


representataion, we shall explicitly say so. 
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Example 5.24. Reconsider the pointer representation p of Example 5.22, but 
assume that the pointer and the list components are stored in separate memory 
sections. So gis a Separate pointer representation, and we denote it by 

p(d) = O\dl) U f’(d). 
Certainly we have not altered the storage costs from those of Example 5.22, bur it is 
possible to implement each , in such a way that we decrease the access costs. 


Possible access sequences for , € T are: 


G4 ce ied Gays aC if m(3(i-L)) =1, dl >i 
0,1,..., i-l, 3(i-1), 3(i-1) +2 if m(3(i-1)) =0, Idl >i 
0,1,..., ldl-1, ldl if ld| <i 


Thus we have for the total number of accesses: 


i+2 if ld] >i 
ALy,(p(d))] = 
ldi+1 if Idi <i 


Notice that this represents an improvement over the access costs we previously had. { 


Although we shall not in general concern ourselves with the way in which separate 
memory sections are allocated, let us note, in the context of this same example, one 


possible scheme. 


Example 5.25, Let f be defined as in Example 5.22, but now define the 


concatenation-preserving representation f,:ID 7 gt by 
lal 
f(a) = Ulla) }aaeay 
If we view the pointer @, as being de ereal we may define 
@,(i) = {(3+4j, 1) 10 < j < i} U ((3+4i, 0)}. 
Then the pointer representation p,:ID > 8" is defined by 
pi(d) = {8 (di) }, YU {F,(d) }o. 


For instance, 


cat Oy 


p,(abad) = 0.0110 1001011 0 
p,(bib) = 101011101 0 
Pfr) =___0. 


Since we are counting only the actual number of cells occupied, the storage has not 


been altered. For ¥, € T’ we have the following access sequences: 


3, 7, 11,..., 3+4(i-1), 41-4, 41-3 if m(4i-4) =1, Id] >1 
3, 7, 11,..., 3*4(i-1), 4i-4, 41-2 if m(4i-4) = 0, Idl >i 
3,7, 11,..., 3+4(ld|-1), 3+4ld| if Idi <i 


This gives us the same total number of accesses as we had in Example 5.24, where 


we simply made the assumption that we had separate memory sections. 


Example 8.25 illustrates an encoding for a separate pointer scheme. Notice that this 
encoding did not affect the order in which memory cell contents were determined ; it 
simply altered the memory cell numbers in which this information was found. We 
can show in general that there is no harm in using a separate pointer scheme if it 
makes our coding job easier, because for any separate pointer representation 9 
there is a pointer representation p” without a separate pointer that achieves the 


same storage and access costs. 


Theorem 5.11. Civen ariy pointer representation p with a separate pointer, 

there exists a pointer representation p” without a separate pointer such that 
lp(d)| =|p(d)| for alld € D 

and ALE ( p(d))] = ALEC p’(d))1 for any operation f, 


Proof: Suppose the representation p:ID > Bt has a separate pointer and is defined 
by 
Ald) = f(d) U Ald). 
We can define a representation p”:ID > 8* without a separate pointer by 
p(d) =f7(d) U 2b’ (ldl) 
where f“(d) = {(2n, m(n)) In € D(f(d))} 
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and O’(ldl) = {(2n +1, m(n)) 1 n € D(e(ldl))}. 

Since DF (d)) A D(A dl)) = B, 

it is clear that lp(d)| = |p%(d)| for all d € ID. Also, any access sequence to perform 
an operation f,; using p” can be mapped in an obvious way to an access sequence [to 


perform f, using representation p. 


Recall that no endmarker representation can achieve Kraft storage for finite ID. 


This is not the case for pointer representations, as the following example shows. 


Example 5.26. Let X = {a,b}, @= {0,1}, and D = LU x1 Define the 
i€{0,1,2,3} 
function f:X > G* by 
f(a) =0 
f(b)-=1 
and the concatenation-preserving function f“:ID > gt by 
lal 


f’(d) = U{r(a(i))},., 
iF 
Let the pointer @:{0,1,2,3} > G* be defined so that 


£(0) = 00 
f(1) =01 
£(2) = 10 
&(3) = 11 


Then we define the representation p:ID > 6* by 
Ald) = {€(ldi)}, U {F7(d)}, 


For instance, 


pla) = 00 

pla) = 010 
pAlb) = 011 
plaa) = 1000 


plabb) = 11011 


The representation p achieves Kraft storage, because 


H 
=] 
no 
fon 
+ 
tw 
~~ 


So we know from examples 5.22 and 5.26 that a pointer representation may achieve 
Kraft storage for ID infinite or finite. 

Let us try to determine under what conditions a pointer representation p does 
achieve Kraft storage. The following theorem shows that the pointer @ must itsclf 


achieve Kraft storage in order for p to achieve Kraft storage. 


Theorem 5.12. Let f be a total function f#D > Bt and let £:J + B™ be a 
representation which does not achieve Kraft storage. Then the pointer 
representation p:ID > gt where 

Ald) = (FD) by (ay U ECCI}, ca) 


does not achieve Kraft storage. 


Proof: Since @ is a representation which does not achieve Kraft storage, 


Sa < L. 


We first show that the eae holds for a separate pointer representation 
p:ID > gat, where 
p’(d) = f(d) U &(ld)). 


Assume that the representation 9” does attain Kraft storage. Then 


eG) | -lf(d) | 
= 2 (Iai 2 13 ) 
i€J dé x! 
Thus, there exists k € J such that 


-If(d) | 
2 Ia S1 
dex* 
So f is not a representation and there exist d,, dn € X* such that f(d,) and f(d,) 


are indistinguishable. But since ld,| = ld, =k, 


oy 


p(d,) =f(d,) U ak) 
and p’(d,) = f(d,) U Lk) 
are indistinguishable, contradicting the fact that p’ is a representation. Thus, p” 
cannot achieve Kraft storage if & doesn't. Since 


lp?(d)| = if(a) 1 + 1e(lal)| = lpCa)I, 


-lp’(d)| 
then > Is 4 #1 
d¢iD (a) 
-lp(d)| 
implies that > al ‘ #1; 
d¢éID 
So the pointer representation p cannot achieve Kraft storage if & doesn't. | 


Thus, the pointer @ achieving Kraft storage is a necessary, although certainly not 
sufficient, condition for the pointer representation p to achieve Kraft storage. 

We frequently consider a pointer representation formed from a 
concatenation-preserving function f” and a pointer £. We now show that whenever 
that concatenation-preserving function f% is based on a function f:X 9 87 which 
itself is a representation and attains Kraft storage, then the pointer represeritation p 
also achieves Kraft storage, assuming, of course, that the pointer £2 achieves Kraft 


storage. 


Theorem 5.13. Let ID = Ux! and consider a representation function 
i€] 
ft¥ > @* which achieves Kraft storage. Let fID> ZT be a 


concatenation-preserving function formed from f and defined by 
lal 


£(d) = Ulta}, cay 
i=l 1 
If the representation @:J > gt attains Kraft storage, then the pointer 
representation pilD > BT also achieves Kraft storage, where 


Ad) ={8°(d) bn cay UV LACED cay 


- 123 - 


Proof: Since |p(d)| = if*(d)| + (Cla) |, 
-lp(d -CiE“ Cd) 1) + 160i 
5 igi pld)| 5 >) ial (if“(d) 1 + 1e(i) 1) 


d€ID i€J gex' 


Ci) lf Cd) | 
> Iai ). 
i€] dex! 
Since f achieves Kraft storage we can make use of Lemma 5.4, which gives us 


-1pld -|€(i 
Sing ipl d)| Sy 1e(i)| 
deID ie] 
=] I 


We now want to determine the conditions, if any, under which a pointer 
representation can achieve Kraft access for the set I’ of table lookup questions and 


also achieve Kraft storage. 


Theorem 5.14. Let D=Ux!. If a pointer representation p:!D ~ gt 
i€J 


achieves Kraft storage and also achieves Kraft access for all , € T, then 


Z| = 2 and D = {a} UX" for some n € N*, 


Proof: Theorem 5.12 guarantees that if p achieves Kraft storage, then its pointer 
function @:J = gt must also achieve Kraft storage. Since IG| > 2, it must be the 
case that IJ] > 2. Thus, ID ~ X¥". Recalling theorems 4.9 and 4.10, we know that if 
a representation g achieves Kraft storage and Kraft access for all ; € T, then 
ID = X" or D = {a} UX™. Since the former is not true, the only possibility is that 
ID = {A} U X™. So if we are to achieve Kraft storage and access at all, then |J] = 2 


and therefore |G = 2. i 


Theorem 5.14 simply says that if a pointer representation is to achieve Kraft storage 
and access, then IZ] = 2 and ID = {A} UX™ It does mot necessarily say that it is 
possible to ever achieve both. The following example shows, however, that it is 


possible. 
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Example $27. Let X = {a,b,c}, @= {0,1}, and ID ={a}UxX*% We want to 
construct a pointer representation p which achieves Kraft storage and also achieves 


Kraft access for T = {¥,, Y2, 3}: To do so, we first define a function f:X ~ ar: 


f(a) =00 
f(b) =01 
f(c) =L. 
We then let £7:ID > &* be the concatenation-preserving function formed from f: 
lal 


g(a) = Ula) doueyy 
iz1 
The pointer function &:J > B" is defined by 


a0) =0 
0(3).=1 


Then the pointer representation p:ID > 8" can be defined by 
p(d) = {aldl)}, U iF(d)},. 


The representation p achieves Kraft storage, because 


gata) = <s yea) + |@({d1)|) 
d¢ID d€{r,X™} 
-lf’(d)| 
so F 2" (a) 
defn, X™} 
= 971(9°5 49°F 4 6.974 + 19.978 + 8.975) 
= 1, 


We can construct access trees for ¥4, Ya, Yg aS shown in Figure 5.11. By 


observation, each achieves Kraft access. I 
This example can be gencralized, giving us the following theorem. 


Theorem 5.15. Consider any domain of the form ID = {a} UX", IX| > 1, 
and assume that IZ] = 2. Then there is a concatenation-preserving pointer 
representation g:ID > 8’ which achieves Kraft storage and for which it is 
possible to implement the table lookup questions T’ = {y, 11 <i <n} so thar 


each Y, achieves Kraft access. 


Y; Yo V3 
(Q) (0) (0) 


Figure 5.11. Access trees for ¥4, Y2, Y, of Example $.27. 


Proof: The construction is like that in Example 5.27. We first define a function 
f:¥ + 8* such that f achieves Kraft storage. It is possible to do this since there 
exists n, € IN such that [Xl = (Il - 1): n,+l=n,+1. A tree T for f has n, 
internal nodes, for which we can choose labels from the set {0,1,...,n-1}. We 


now define the concatenation-preserving function f7:ID > Bt formed from f: 
lal 


£(a) = Ud nay 


i= 4 
The pointer function &@:J > gt is defined by 


0(0) =0 
O(n) =1 


From these we define the pointer representation p:ID > gt: 

Ad) = {l(ldi)}, U {f’(d)},. 
By Theorem 5.13, since f and @ achieve Kraft storage, so does p. Also, if ‘I’ is the 
full tree corresponding to f, then the access tree for any ¥, € T is of the form 


shown in Figure 5,12. Thus, p achieves both Kraft access and Kraft storage. | 


We have seen by Theorem 5.14 that only for |g| = 2, ID = {a} UX" can a 
pointer representation achieve Kraft storage and access. Let's try to see when it is at 
least possible to achieve Kraft access. The following example presents such a 


scheme, but the resulting pointer storage cost is high: |J| - 1. 
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AN 


D (Th yon cea) 


Figure 5.12. Access tree for y, in proof of Theorem 5.15. 


Example 5.28. Let G = {0,1}, ¥ = {a,b,c,d}, and ID = Ux for J = {0,3,5,6}. 
i€ 


Let f:X 3 gt be defined by 


and define from f the concatenation-preserving representation f%:ID + GZ" so that 
lal 


f“(d) = ShCh apes 


i= 4 
(a) Define a length function £,:ID > gt by 


0,(lal) = 1!%lg Hal. lalg6-il 
Then a pointer representation p,:ID > 8" can be defined by 

pPy(d) = (F(a) }, U (2, (ld) }o. 
Notice that Ip,(d)| = lf7(d) 1+ 6 = Qld! + 6. 
Since @, does not achieve Kraft storage we know that p, does not either. On the 
other hand, we can implement each y, € I’ so as to achieve Kraft access. We do 
this by first reading the i bit of O,(\dl). If m(i) =0, then we know Id] <i and 
so ¥,(p(d)) = g. On the other hand, if m(i) =1 then we know y,(p(d)) = 2 
and we look in locations 2(i-1) +6 =2i+4 and 2i+5 in order to determine 
v;(e(d)). ‘Thus, each Y,; can be implemented by an access tree as shown in Figure 
Suid. 
(b) Recalling Theorem 4.14 leads us to try to find a length function @,, where 
le(lul) | ={j}-d. Since we know, for instance, that X*¢ ID, then 


v4((d,)) = @ if and only if y,(p(d,)) = 2. So we define the length function 


sae oe 


Figure 5.13. Access tree for y, of Example 5.28a. 


(3D 9 Bt by 


000 if ld = 0 
@,(ldl) = 100 if di =3 
110 if di = 5 
lll if [dl = 6 


Then the poiriter representation p,:ID > 8" is defined by 

pld) = {f(d)}, U ft, (dl) } 9 
Once again, p, cannot achieve Kraft storage since £2 doesn't. But we can 
implement each y,€I° so as to achieve Kraft access, as shown in Figure 5.14. 


Notice that, for all d € ID, lp,(d)| > 3 = 1Jl - 1, as required by Theorem 4.14. H 


YpoV%aV%s Yats 
(2 0 
B (21+) ZB i+) 
Zi+D (2i+Z) Zis2 Eid) 
a boc ‘d a boc d 


Figure 5.14. Access trees for all y, € I’ from Example 5.28b. 


The method used in Example 5.28b can be generalized so that it is always possible, 


when |Zl = 2, to construct a pointer representation that achieves Kraft access. 
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Theorem 5.16. Let D = ae and let I@l=2. Then it is possible to 
i€ 


construct a concatenation-preserving pointer representation p:ID > B* such 


that each ¥, € T’ can be implemented so as to achieve Kraft access. 


Proof: Construct some representation f:X > @* such that f achieves Kraft storage. 
Sirice [3] = 2, it is always possible to do this; f corresponds to some full tree T. Let 
k = maxlf(x)| and define the concatenation-preserving function f*:ID > gt by 


x€X 
lal 


i(d) = Ui cer 
We define the length function &:J 2 @t in such a way that [2(i)| = IJ! - 1, for all 
i€ J. First, index the elements in J so that J = {tai ify topes }, where iy Slee 
Then define 
i.) = phalietn 
The separate pointer representation gD > B" defined by 
Ad) =f’(d) U Aldl) 

can be implemented so as to achieve Kraft access. [For instance, vs € I’ can be 
implemented as follows. Determine the least value i, € J such that j Si,. Then an 
access to cell n-l of the pointer indicates whether or not ¥ jf Ald)) = w: 

m(n-1) =0 = y(pld)) = 2 
and m(n-l) =1 = 7 (pla) ) 4B 
It y(pld)) * @, then we can go to cell k(j-1) of the list function f“(d). Figure 
5.18 illustrates an access tree for Yj, where the nodes of T correspond to memory 


cells of the list component f’(d). I 


Although the pointer representation constructed in Theorem 5.16 can achieve Kraft 
access, this is at a potentially very high storage cost, since for all d € ID, 
la(d)| > lJ - 1. Unfortunately, by Theorem 4.14 we know that we cannot 
uniformly improve this storage. In other words, if we insist on Kraft access for all 


vy, € T, then we are stuck with |p(d)! > 1Jl - 1. 


7; Note: cell n-1 is in 
ox pointer representation 
a Ther) 


Figure 5.15. Access tree for v;é I’ in the proof of Theorem 5.16. 


Theorem 5.15 presented a method for constructing a pointer representation so 
as to achieve Kraft access, but it was only for the case |G] = 2. This leads us to 
wonder whether it is possible to extend the result to [Z| > 2. The following theorem 
shows that, for |G] > 2, it is not possible with a pointer representation to implernent 
each ¥, € I’ so as to achieve Kraft access, unless the pointer component 1s 


“superfluous”. 


Theorem 5.17. Let Z| > 2 and consider a function f:ID > Bt. Let piD > gt 
be a pointer representation 

| pd) =f(d) U alld), 
where @ is a representation @:J > Bt. If f is not by itself a representation of 


ID, then p does not achieve Kraft access for all y, € I. 


Proof: Let the function f not be a representation, and assume p does achieve Kraft 
access for all ¥, € I’. Since f is not a representation, there exists VES T’ such that 
the access tree for ¥,, I’, has an internal node labelled r € D( @({d1)). By Theorem 
4.1 and Corollary 4.2.1, since y, achieves Kraft access, it has |X| +1 leaves with 
distinct labels from the set Y U fg} (or |X] leaves if ID = X") and the ede # dias 
Isl branches. Let one of the branches from node r eventually lead to some leaf 
labelled x, € X and another branch from r eventually lead to a leaf labelled 
Xp X. There is some d, € ID such that d,(k) =x,, r€ (Ly, (pld,))]}, and 
r€ D( (ld ,|)) where m,(r) = b € B for m, 2 pld,). Let 
d, ={(n,d,(n)) [1 <n < Idjl, n #k} U {(k,xQ)}. 


In other words d, differs from d, only in its x" clement. By the definition of ID, 


d,€ID  imples that d,€'D. Then m,(1) =b’€ 8, where b’ #b, for 
my, 2 Ald,). Since ld,l = ld,|, ld, = eld.) and so m,(r) = m,(1), since 
r€D(Ce(ld,|)). This gives a contradiction. Thus, g cannot achieve Kraft access 


for all y, €T. I 


Thus, if a pointer representation achieves Kraft access for all y, € I’, then the list 
component f was itself a representation and so we need not have stored any pointer 
at all. Effectively, this says that it is impossible for all -y, €T to achieve Kraft 
access with a pointer representation in which the pointer is in fact needed to store 
length information. Certainly, it is not possible for a concatenation-preserving 
pointer representation to achieve Kraft access, since a concatenation-preserving 
function f% is not a representation (except in the trivial case where Ge ¢ ‘ID for 


koa): 


Corollary 5.17.1. Let D=Ux', where maxi>l. If Igl>2, no 
iCJ i€J 


concatenation-preserving pointer representation can achieve Kraft access for all 


7 <I. 


We have scen that for a pointer representation we in general cannot hope to 
achieve Kraft access. On the other hand, we know that we can actually achieve 
Kraft storage. So let us discuss how well we can do for access costs if we insist on 
Kraft storage. This is the approach we take for the rest of this section, and we 
Shall see that pointer representations can, in fact, be quite efficient in terms of 
access as well as storage costs. 

Recall the pointer representation scheme used in Example 5.24. Since the list 
component f% had fixed position fields, we could immediately (and with Kraft 
access) determine the answer to any 7; € I’, as soon as we knew the answer was not 
@. So in order to answer a table Jookup question Y;, we read enough of the 
pointer to know whether or not [d/ > i. Since the pointer function £:J ~ gt was 


defined by @(ldl)) = 1'0, this meant we had to read i bits of the pointer for 
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ld] = 1 and ldl +1 bits of the pointer for ld] <i. We shall present a scheme to 
reduce the length [@(ld1)|, which therefore reduces the cost of accessing the pourter. 
For |Z] =2 we saw in Example 5.22 the pointer representation, where 
€(n) =1°0; this is essentially a unary representation of n followed by an 
endmarker. It would, of course, be desirable to somehow represent n in binary, 
which would descrease the storage cost but would generate the problem of detecting 
the end of the string; ic, we need sorne way to guarantee that @ 1s a 
representation, Since D(@) = iN, we use a universal encoding method as described 
by Elias C7]. In this scheme we successively encode, in binary, the length of the 
result of the previous encoding. For instance, we could represent ld} as a binary 
string s, which would have length is] * log,id|. If we were to use, say, a unary 
encoding to specify |sl, then we could write €'(ld{) = 0s, which wives us 
le*(iul)| = Qlsl +1 © 240g ld] + 1, 
an improvement for large |d| over our previous scheme's cost, where we had 


beC lady) i 


case [75] 


ld| + 1. In the following example we present an encoding scheme for the 


It 


Example 5.09. Recall the fixed position field concatenation-preservine funcuon f7 


from Example 5.22. Our concern here is with finding an efficient leneth 
@~ 


representation &:J oma Assume for simplicity that iD = Ux! Rather than 
i=O 


defining €(ld|) = pldig, as we did in Example 5.22, consider representing Jd] =n as 


a binary string as follows: 


= 
a 
— 


000 
001 


PADS WHF CO] 
Oo 
- 


e.ldd= 


More formally, we can define hiN > G@* by letting h(n) be the binary 
representation of n+1, with the leftmost symbol deleted. For example, to 
determine h(21), we write 22 in binary, 10110, and then delete the leftmost symbol 
(always al): h(21) = 0110 (see Table 5.1). Notice that 
lh(n)| = Llog,(n+1) J. 
We now define a pointer representation aN » go by 
O(n) = 0PM n(n), 
as also shown in Table $.1. The storage cost for the representation £} is 
[e'(n)| = Q4h(n)i + 1 = 2-Llog,(n+1)J + 1. 
We can show that the representation @' achieves Kraft Storage by noting that, for 
each j € IN, Llog,(n+1)J = j for 24 consecutive values of n: 
5 9 len) : 5p (atloga( n+l) +1) 
n=0 n=0 


= > 9.97(2J + 1) 


Thus, a worst case access cost to determine whether or not y,(d) = @ is just 
2-Llog,(n+1)J +1, an improvement over the scheme in Example 5.22 (or Example 
5.24), which had a worst case of n +1. In general, we can expect to do even better 
than this, reading only as much of the pointer as necessary. Because only two 
accesses of the list representation are required to read the answer v(d) for this 
particular example, we have the following access costs: 


Tlog,(n+2)7 
Mog,(it2)1 + 2 forn< 2." 


Mog,(n+2)1 - 1 Leite flogaint2)1 


My (d)] = < Q-Llog(n+l)J + 3 for 2 2 


Flog 9(n#2) 1 ser ee log 
Using the same trick over again, we can encode the length of the length of n 
by defining the pointer representation £#:N 3 3* by 
£%(n) = oP CUPDL AC in(n) 1) h(n), 


giving a storage cost of 


nN => Ww te © 


worms Nn 


h(n 


010 
OL 
00100 
00101 
00110 
OOLLL 
0001000 
0001001 
0001010 
0001011 
0001100 
0001101 
0001110 
OO01LLL 
000010000 
000010001 
000010010 
000010011 
000010100 
000010101 
000010110 
OO00LOLLL 
000011000 
000011001 
000011010 
OOOOLLOLL 
“000011100 
000011101 
000011110 
OO00LLL11 
00000100000 
00000100001 
00000100010 


67(n) 


1 
0100 
0101 
01100 
01101 
01110 
Oll11 
00100050 
00100601 
00100010 
00100011 
00100100 
00100101 
00100119 
00100111 
001010000 
Q01010001 
001010010 
001010011 
001010100 
001010101 
001016110 
001010111 
601011600 
001011001 
001011010 
OOLOL1011 
001611100 
001011161 
001011110 
OO1011111 
0011000000 
0011000001 
0011000010 


Table 5.1. Construction of pointer representations £' and £2, for |@l = 2. 


a 


[e7(n)1 = 2th(ih(n) I) + Ih(n) 1 + 2 
= 2-Llog,(Llog,(n+1) J + 1)J + log, (ntl) J + 1, 
and, as for e. it can also be verified that the pointer representation £? achieves 


Kraft steneee: 
5 -|07(n)| 5 le Llog( Log ,(n+1)J + 1)J + Llog,(n+1)J + 1) 
n=0 ae 


5 97 (dllogat j*1)J a aaa 


Jr0 
_ 5 7 (alos J+) J +1) 
ir0 
5 1e' (n)| 
- 32 
=o i 


The pointer representation construction procedure presented in Example 5.29 
can be applied indefinitely, encoding the length of the length of the length of n, 
etc. It can also be extended to the case where |Z] > 2. In order to do this, we make 


use of a moc-|Gl successor operation, ® , on strings. We define ® so that, e.2., 
e) 


Gi 
®, corresponds intuitively to addition base 2 with the leftmost 1 deleted: 

0@,1=1,1@,1=00,00@,1=01,...,116,1=000,... 
For ®. we would obtain the sequence of strings 


1, 2, 00, 01, 02, 10,..., 22, 000, 001,... 


Definition. Consider a binary string 
$ = SiSigi gt ++ + ‘Sg’Sp'Sy € B™ 
and let 
k = min{i | s, # Il - 1}. 
(If s, = IG@l- 1 for 1 <i < Isl, by convention we have k = |s| +1.) Then we 


define s‘’=s® 1 
1B 
by 
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0 forl <i<k 


si =< 5s, +1 fori=k,k < Is! 
S, fork <i < Is! 
0 fori =k, k =\(sl+1 


So Isl = Isl except when s = {IGI-1}", in which case s’ = {0}!5!*? and Is’I = Isl + 1. 
We can now define a function h as a |@l-ary string representation of a natural 


number n. 


a} 


Definition. For |g] > 2, let h be the encoding of n € iN as a |Zl-ary 


string, he ~> 6*, where 


gee 

h (1) =0 

- 1) (n) 1 
ned) eh 1, 
ia" lan’ ial 


For any string 6 € G*, for |G| = 1 we by convention define 
h,(b) 26. 
We extend our notation and write h ial (™ to indicate k + 1 applications of 
i él 
14 ne oat 
h kin) eh Ein (dD. 
Where the particular 1G] we are considering is clear, we may simply write h rather 


than h |. For instance, the function h in Table 5.1 corresponds to hy. Notice that 


re) 
1- h,(n+1) = 1+ h3(n) + 1, where the addition is in base 2. 


Example 5.30, For IZ] = 3, Table 5.2 illustrates h(n) and h ACE To see how we 
can use the above definitions to determine h,(n), assume we know h,(11) = 21. 
So 

hj(12) =h,(1l) 6,1 = 21 1. 


Letting s = 21 =s,s,, then min{i | s, # I@l-1} = 1 and so 


fs, +1 forizl 


Sp fori =2 

Thus, 
s’ =h,(12) = 22. 

Similarly, h,(13) = hg(12) @, 1 = 22 @, 1, 
and for s = 22 =s,8,, then min{i ls, # 2} = 3 = |s| +1 and 

h,(13) = 000. 
Using the above notation we have, e¢., 

h 3(n) =h 2(Ihg(n) 1) = ha(ilh,(h3(n) 1) 1) 

Notice that |h,(n)l =0 for one value of n, |h,(n)| =1 for three values of n, 


lh4(n)| = 2 for nine values of n, etc. I 


In general, since |s ® 4 1] = |s} except for s = {igl-1}!!) we note that the above 
definitions, by design, give us the following lemma. 


Lemma 5.6. For |@l>1, there are II’ values of n€iN such that 


lh (n)l =r. 
lol 
Lemma 5.6 immediately allows us to show the following. 


Lemma 5.7. Letn € IN. For Idi > 2, 


Ih_ (nd = Ulog (1Gl-1) (n+l) J. 
lal B| 
For |4| = 1, lh (nl =n. 


Proof: For |@l =1, lh,(n)| =n by definition. So consider IG] 2 2. Since Lemrna 


5.6 tells us that there are IZI’ values of n € N such that Ih gh! =r, then we know 


T 


there are 2I|Gl' values of n such that [h \l <r, and 


i=0 al" 


=A3I-< 


n ha(n) 4 s(n) ha(Iha(n)|) @ tn) 
0 = ys - 2 

1 0 020 0 0200 

2 1 021 0 0201 

3 2 022 0 0202 

4 00 1200 1 02100 

5 OL 1201 1 02101 

6 02 1202 af 02102 

7 10 1210 1 O2110 

8 i} 1211 iL O2111 

9 2 1212 i 02112 
10 20 1220 i 02120 
ll 2 1224 1 02101 
1 22 1222 1 G2t22 
13 000 002000 2 022000 
14 001 002001 2 022001 
1S 002 002002 2 022002 
16 010 002010 2 022010 
17 et 002011 2 022011 
18 012 002012 2 022012 
19 020 002020 2 022020 
38 221 002021 2 22221 
39 222 002222 2 022222 
40 0000 0120000 00 12000000 
41 0001 0120001 00 12000001 
42 0002 0120002 00 12000002 
43 0010 0120010 00 12000610 
44 0011 0120011 00 12000011 
120 2222 0122222 00 12002022 
121 00000 10200000 01 120100090 
122 00001 10200001 01 120100001 


Table 5.2, Construction of pointer representations £2’ and £7, for |@l = 3. 
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h gt”! =minfkin< 216! - 1} 


i30 
kt 1 
=minfkln+1< a} 
= min{k | (I@] - 1)(n +1) < lelk** - 1} 


: 2 ry I 
Log) (!2l 1)(n+1)J 


We now define our class of pointer representations, extending the €) and 0? 
of Example 5.29. For |@] = 2 we want: 
e 3(n) = OO) AC n) 
@2(n) = oPCUROODL pC Ih(n) 1) h(n) 
@ (nm) = oFRARCROODOL CnC) 1) 1) bh Cn(n) 1) h(n) 


i 

Notice, however, that for |G] > 2 the first component of 2 a gin a, 1, can in fact 
va} 

be encoded in base |Gl - 1, leaving one unused symbol to serve as the endmarker. 


So we can formalize the class of pointer representations as follows. 


Definition. Let || > 2. We can define a class of pointer representations ge 


for i > 0, as follows: 


k 
k = 
Ofer) = Cry Ch (nD Ciat-d) 39 U (Uh 1m}, 
wher sit 
vere nel In lh (n)1)| Sink n)| 
Therefore 
k 7 eee k-1 2 
ae an n) hgh ign)? (6 (1Zl-1) +h iat’ )+h 4 (n)+...ch oe ls ‘h 4 


Example 5.31. We now verify that the definition behaves as we would like for 


IGI = 3, writing h to mean hy. In particular, 
@3(n) = ha(Ih(n) 1) 2-h(n) 
@3(n) = ha(ih(ih(n)1) 1) -2-h( h(n) 1) h(n) 
e 3(n) = ho(Ih°(n) 1) -2-h(h(h(n) | )I) -hUh(n) |) h(n) 


3 


Thus, 0 3(n) = (ha(Ih%(n) (Uin'(n)},,.) 


where n, =tha(h S(m)I)l+ 1+ th 3(n)1 + th 3(n)! 
ng =lha(Ih 3(n)I)I+ 1 + th 3(n)I 
ng =lha(th 3(n) Il +1 

So 


0,°(n) = ha(lh°(n)1)-2h 8(n)-h 2(n)-h 3(n). 
The length [@ ial should immediately be clear. 


Theorem 5.18. Let |G] > 2, k > 1. Then 


k 
@* (n)i=th (Ih * (nel + Qh) (nh 
ial Igi-. lal 


ied Il 


While the exact numerical value of llog i)! can be obtained by substituting 
Llog 


g (1ZI-1) (n+1)J for Ih al n)| in the expression in Theorem 5.18, we can see 
“3 
that we essentially have: 


l'(n)| = Qlog n, 
[07(n)| = log n + Lloglog n, 


|e7(n)| = log n + loglog n + Zlogloglog n. 


In any case, we can make the following statement. 


Corollary 5.18.1. Let i@) >2,k >1. Then 


le) (n)| = O(log n) 
ia “ia 
We can row show that each of the pointer representations 


., achieves Kraft 
storage, 


ls 


Theorem 5.19. For (@| > 2, k >1, each of the pointer representations @ a 
re 
achieves Kraft storage: - 


oOo -| pk 
Sig ek 
n=0 


Proof: The proof is by induction on k. Once again, we write £to denote 4 and 
& 
h to denote h_. 
IGl 
Basis: For k =1, 
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-|a*(n)| : 5 al hy aj h(n) 1) + 1+ |h(n)i) 


5 | by Lemma 5.8 
n=0 n=0 
aS iat Ihigi-1 (7) a el by Lemma 5.6 
r=0 
LS poy tigi)! 
a3 
= We! lol"s-(1gl - 1)4 by Lemma 5.6 
J=0 
= 
Induction step: Assume the resulh holds for k3 i.e, assume that 
-|0*(n)| 
Eig ls =1. 
Then 
Sige tl z ay lg phe CDI + 1 + thin) D 
n=0 7 
- > ial (lhygy -y(b*( h(n) 1) 1) #104 Ih(n)] + + Zh *(n)) 


- Ke j +]+4j]+ Wj 
3 = hg DD 1+ j 21ND 


> Sa [hig (Un S(j) I) + 1 + 2th j)1) 


~ig* 
7 i (j)| 


Since each of the pointer schemes £' achieves Kraft storage, it follows from ‘Theorem 
5.13 that a pointer representation which uses 0! also achieves Kraft storage if the list 


component is storage efficient. 


Corollary 5.19.1. Consider a separate concatenation-preserving pointer 


representation p:ID > B* defined by 
lal 


pd) = Ue), fay UE (Idi). 


~ 14i - 
lf f achieves Kraft storage, then p also achieves Kraft storage. 


So we have presetited a pointer ericoding scheme which allows us to represent 
lists of unbounded length and also achieve Kraft storage. Consider a fixed position 
field, separate, concatenation-preserving pointer representation fp, and let us see 
how well one can do for access. We already know, of course, that we cannot 
achieve Kraft access. So suppose we want to answer some table lookup question 
v,€T. We can do this by reading the pointer in order to determine whether or 
not |\/| >i. If it is not, then we immediately return the answer g. If it is, then we 
go to the appropriate memory location to read the answer. So at worst we need to 


make 


i 

le a dl) | = log ial + 2 log igi +k 
accesses, where k is some constant depending on the function f, at most the size of 
a field mn, We can often do even better by only reading enough of the pointer to 
determine if |d| >i, but, of course, for ld| =i we would be forced to read 
le(ldi)1 +k = O(logld!) cells. We shall discuss this encoding in the context of stacks 


in Section 6.4. 
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CHAPTER 6 
STACKS 


In Chapter 1 we discussed what we mean by a stack, a linear list for which all 
insertions and deletions are made at the top of the stack. Much work has been 
done to obtain formal specifications of the stack as a data type (see eg., Liskov and 
Zilles (18), Lehman and Smyth (17]), but such a formal definition is unimportant 
for our purposes. Any scheme which captures our intuitive notion of a stack would 
suffice. It is our goal to. apply some of the techniques we have thus far developed 
to analyze some stack implementations in terms of Kraft storage and access. We first 
define the basic stack operations and in the following sections we examine 
endmarker and poionter stack representations. Table 6.3 at the end of the chapter 


summarizes some of the lower bound results. 
6.1 Stack Operations 


While there are various operations we might wish to consider, any stack 
implementations will have PUSH and POP operations. These are presumably the 
only update operations that we shall want to perform on a stack. We also want 
some way to read elements in the list; we at least need to be able to read the top 
stack elemerit. So we begin by formally defining these three stack operations: 
PUSH, POP, TOP. 

Viewed in the problem domain, a PUSH operation causes a new value in X to 
be inserted at the top of the stack, thereby increasing the stack length by one. So a 
PUSH is a pure update, provided the stack can grow indefinitely. Where the 
memory size L is bourided, some sort of “Error" statement must be returned if an 
attempt is made to PUSH a value onto a stack which has no room to grow. Thus, 


we define a PUSH operation to consist of both a question and an update. 


- 143 - 


Because we are considering only domains of the form ID = U {x3} and a 
1€J 


PUSH operation will cause a stack to increase in size by one, it makes little sense to 


consider domains where i,i+2€J but i+1¢J. So for simplicity we shall 
L 


henceforth assume that ID = U{x'}, where L may be infinite. 
1=0 


For the problem domains we are considering, if b € X! and 6 € ID, then 
X' ¢ ID. So any value in X can be pushed onto a stack at any time, and there are 
in general |X! different PUSH operations. The following definition states more 
formally what we mean in the problem domain by a PUSH operation. 


L 


Definition. In any problem domain D = U{x'}, we define the class of 
1=0 
PUSH operations 


Feusn = { fpusix |* €% J, 
where each PUSH operation fpyo, consists of a question component and an 
update component: 


fpusnx = (Qpusux: Upusix): 
For any d € ID, 


gw if ld) < L 
Qrusny(@) = 
Error if ld) =L 
and 
dU {(Idl,x)} if qpygy,(d) = 0 
Upusuxid) = ; 


d else 


If L is infinite, then any finite stack is allowed, and we always have 
Gpusux(d) = 0 
Upysnx(@) =d U {(Idl,x)} 


and so we can view a PUSH operation as a pure update. 
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Similarly, a POP operation also consists of a question and an update portion. 
A POP causes the top stack element to be removed; ie, the stack length is 
decreased by one. If the stack length is already empty, however, then its length 


should not be decreased and some sort of “Error must be returned. 


L 


Definition. For any problem domain ID = U{x'} and any d€ 1D, we 
i=0 


define a POP operation fpop by 


frop = (qd pops Upop) s 


where 
@ if Idi > 0 


dpop = 
Error if dl = 0 


and 


Upop(d) = {(n,d(n)) 10 <n < dl ~ 1}. 


Note that Upgp(d) = @ when Id! = 0 (as well as wine ld| = 1). We have defined 
the POP operation to be a pure update when ld] # 0. 

We read the stack via the top element, using the operation TOP. Since the 
stack state is not altered, Yopld) = d and TOP is defined as a pure question. 


L 
Definition. For any problem domain ID = U{x'} and any d€ ID, we 
i=0 


define the TOP operation fo, as a pure question: 


d({dI-1) if Id] > 0 
fropld) = drop(d) = 
Error if Id] = 0 


We might have chosen to define a POP operation so as to return the value 
which it deletes from the top of the stack. Instead, we define another operation, 


TPOP, to serve as a combination TOP and POP operation. 
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L 


Definition. For any problem domain ID = U{x'} and any d € ID, we 
i=0 
define the TPOP operation frpop by 


frpop = (Gtpop: Utpop) » 
where Gtpop(@) a Gropld) 


and Uppop(d) = Upop(d). 


So TPOP returns an "Error" message precisely when |d| = 0. In general, we choose 
to discuss separately the component TOP and POP operations and only occasionally 
make reference to the TPOP operation. 

We have defined the basic stack operations that we shall consider. Notice that 
a PUSH or POP operation causes the stack size to increment or decrement by at 
most one. It is also possible to execute the composition of a fixed sequence of 
operations; eg., to push a sequence of k symbols onto the stack. We might extend 
this notion and consider the execution of a conditional sequence of operations in 
which the operation to be executed next (if any) depends on the answer sequence 
returned by the operations performed so far. For instance, there might be an 
operation to clear the stack; i.e, POP until stack is empty. 

We shall in the rest of the chapter consider several stack representations and 
see how efficiently it is possible to perform the basic stack operations. Recall that 
the operation definitions we have presented describe behavior in the problem 
domain; for a particular representation, the operation behavior in the machine 


domain might or might not resemble the problem domain behavior. 
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6.2 The TOS Endmarker Representation 


Consider using an endmarker representation to implement a stack and the 
PUSH, POP, TOP operations, The following example illustrates one possible such 


implementation. 


Example 6.1. Let X = {a,b}, @ = {0,1,2}, and ID = Ux! Let the function 
i=0 
f:X Ui #} > &* be defined by 
f(a) =0 
f(b) =1 
f(g) =2=96 
Define the concatenation-preserving cndmarker representation g:ID > G* by 
lal 
pd) = Ult(d(i))},, U (higp 
t=1 


In this representation, one symbol from Z, namely 2, is reserved to tell us when we 
have reached the top of the stack. For instance, 


pla) = 2 
plabaa) = 01002 
p(baabba) = 1001102 


lf we view each d € ID as a stack, then we might implement the POP, PUSHx, and 
TOP operations by first reading the stack representation from left to right until we 
detect the end-of-stack marker 2. For a POP operation, we then back up and put 
% in the previous cell. Assuming L is unbounded, this corresponds to the following 
algorithm. 
Qpop: ic 0 
while m(i) #2 doiei+l 
ifi =0 then return "Error" 
else m(i- 1) © 2 


For PUSHx and TOP operations, we similarly read until we detect the end of stack 


marker 2, and we can then immediately perform the desired operation. 
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Opysux ie 0 
while m(i) #2 doieci+l 
mi) © f(x) 
m(itl) © 2 

Osop i«9 


while m(i) #2 doiei+l 
if i =0 then return "Error" 
else if m(i-1) =0 then return "a" 
else return "b" 


These algorithms give us the following access costs when no Error conditions are 
encountered: 
ALOpopl pld))I = |d| + 2 
HUQoysy,( Pld) = Idi + 2 
| ALQropl pld))I = ldl + 2 
We could improve slightly the access cost for (hoop by remembering the previous 
cell value in sore location called "temp", as we make our left to right reading of 
the stack representation. 
Gop ie 0 
while m(i) #2do temp © m(i) 
Pas eae | 
if i =Q then return "Error" 


else if temp =O then return "a 
else return "b" 


This modified algorithm gives us a rermory cell access cost of 
ALOrop( Ald))] = ld) +1 
Although temp can be viewed as requiring additional cells, we choose to let temp be 


part of our processor state, and so we do not include it in the mernory access cost. 3 


A representation such as p in Example 6.1 is a natural one to use if we choose 
to implement a stack with an endmarker representation. We assume the bottom of 
the stack is at some fixed (known) location, and we reserve some string 0 € gt to 
denote the top of the stack. We shall also require that a TOS endmarker 


representation have fixed position fields and that D(o) ¢ U D(f(x)); the 
x€X 
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reasons for these assumptions will be made clear shortly. We now make the 


following definition. 


oo 
Definition. Let ID = UX! and consider a function f:X¥U{g} > gt. Let 
i= 
piD > gt be any fixed position field endmarker representation 
lal 
d) = f(d(i U to 
p(d) Ute (i))},, Ut bates 


where n, €N, for any i€ N*, and f(g) = 0 If D(0) ¢ U p(f(x)), then 
x€X 


we refer to p as a lop of stack (TOS) endmarker representation. 


Clearly the representation p in Example 6.1 is a TOS endmarker representation. We 
use the term TOS because the endmarker 6 is always situated in the set of cells 
which the stack element d(|dl+1) would occupy, if there were one. In other words, 
® is in the field at the top of the stack. The representation is easiest to visualize 
when n,,,; > n, and each field consists of contiguous memory cells. Notice also that 
it is not mecessary that each field have size one. The following example illustrates 
another TOS endmarker representation and shows that we need not restrict 


ourselves to the case where [ZI > |X| + 1. 


co 
Example 6.2. Let X = {a,b}, @ = {0,1}, and ID = Ux" Define the function 
i=0 
f:XU{ g} > B* by 
f(a) =00 
f(b) =01 
f(g) =1 =90 
Then we can define the TOS endmarker representation p:ID > gt by 
lal 


Ad) = UE oa-1) U {0} aigp 
For instance, we have 


pla) =1 
plabaa) = 000100001 
p(baabba) = 0100000101001 


Similar to what we did in Example 6.1, we can implement the POP, PUSHx, and 


TOP operations by first reading p(d) from left to right until we detect the end of 
stack marker 0. However, since If(a)| = If(_b)| = 2, we can locate 0 by reading only 
cells 0, 2, 4,..., until we detect al. We then perform the desired operation in a 


straightforward way. Thus, assuming L is unbounded, we might use the following 


algorithms. 


Qpop: 


Qpusnx? 


Qsop 


These produce the following access costs, when no Error conditions are encountered: 
HOpop( pl(d))1 = Idi + 2 

HOpysuy( PCa) )I = Id! + 3 

HOsop( pld))1 = ld] + 2 


In both examples 6.1 and 6.2 we found that the access costs for the stack 
operations POP, PUSHx, and TOP grow with |d|. This leads us to wonder whether 
it is ever possible to perform the operations with fewer accesses. 


that the answer is no. In particular, whenever a TOS endmarker representation is 
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ic 0 
while m(i) #1 doiei+2 


if i= 0 then return "Error" 


else m(i-2) «1 


i« 0 

while m(i) zl doieci+ 2 
m(i) «0 

if x =a then m(i+l) <0 
if x = b then m(it+l) «1 
m(it+2) «1 


i«0 
while m(i}) #1 doie i+ 2 


if i = 0 then return "Error" 


else if m(i-1) =0 


used we show that for each d € ID it must be the case that 


ALQpop( pl(d))] > Idi +1 
ALApycuy( Cd) ) I > Idl + 2 


then return oa 
else return "b" 


We shall prove 


=1500- 


ph yee for ld| > 0 


HLQropl p(d))1 2 
1 for ld| = 0 


To aid us in proving these results we prove three lemmas. The first says that 
when |d| = 0 any algorithm for a POP, PUSHx, or TOP operation will access the 


n, field, which is also the Nigies field. 


co 

Lemma 6.1. Let ID = UX! and let d,€ ID, Id gl = 0. Consider a function 
i= 

xXU{ ge} > gt, Let pilD > gt be any TOS endmarker representation 


ld 


pd) = Uft(a(i))}, (hn oie 
i=. i + 
Then any implementation of a POP, a PUSHx, or a TOP operation on data 


base dy € ID must access some cell in the ny = njqi,, field. 


Proof: If ldgl = 0, then pldy) = On Thus, if a stack operation is performed 
without accessing the n, field, then no cells in p(dy) were accessed at all. Even if 
we accessed every one of the (infinite number of) other memory cells, we would get 
no information concerning whether or not ld| = 0. Effectively, this says that we 


were able to perform the operation with no accesses, an impossibility. i 


Lemma 6.2 guarantees that performing a stack operation on any d € ID causes the 


ny field to be accessed. 


ve] 
Lemma 6.2. Let ID = UX! and consider a function fiXU{ ew} > Bt. Let 
1=0 
piD - 8" be any TOS endmarker representation 
\dj 
pd) = Ulr(a(i))}, Ufo}. 
i=4 i ld|+1 


Then for all d € ID any implementation of a POP, a PUSHx, or a TOP 


operation must access some cell in the field n,. 


= Abh = 


Proof: Let dg be some stack, d, € ID. As a consequence of Lemma 6.1, the result 
of this lemma clearly holds for ld)! = 0. So consider the case where |d,| > 0, and 
let m be a memory state which contains the representation of dg; m2 pld,). 
Assume there is an algorithm Qop for the stack operation OP (one of POP, 
PUSHx, TOP) such that {LQ),( p(d))J} does not contain any cells in the n, field. 
Let m, be a memory state which differs from m, only in the contents of field n,: 
m, = {(n,m,(n))|n ¢ D(field n,)} U {0}, 

So m, represents the empty stack d,, ld,l=0. Since Qo, does not access the ny 
field when applied to memory state mo, it also does not access the field n, for the 
memory state m,. Thus, Qo, performs the same operation in either case. Let us 
now look separately at the three stack operations. 
(i) Consider the operation POP, and notice that 

Upop(d,) = Error 
whereas Upop(d) # Error. 
Thus Qpop cannot always operate correctly without accessing the ry field. 
(ii) Similarly, Qsop cannot always give the right answer without accessing the field 
n,, because 

Atop(¢ 1) = Error 
whereas Grop(dy) # Error. 
(iii) Qousny will write a 0 in field ny if and only if the current memory state 
contains a representation of the empty stack. 
Thus, for all d € ID, an algorithm which implements a POP, a PUSHx, or a TOP 


operation will access field n,. I 


It is also necessary that the endmarker field be accessed, as the following lemma 


shows, 


ie ae 


oo 

Lemma 6.3. Let D = Ux! and consider a function f:XUf go} > gt. Let 
1=0 

p:ID > Bt be any TOS endmarker representation 


ld 
ad) = Uft(a(i))}, ufo. 
\=4 i ld|+14 


Then for all d € ID any implementation of a POP, a PUSHx, or a TOP 


operation must access the nj4),, field. 


Proof: For a PUSHx operation, 


lal 
Ad) = Ufe(d(i))} u {0}, 
{=1 lal+1 
lal 
and A Useeatd)) Ss PCLORE U OC) ns U 1), en 


and so field n jgj+1 Must be not only accessed but rewritten. 

The rest of the proof is similar to that of Lemma 6.2. Lemma 6.1 shows that 
this lemma holds for any dy € ID such ld] = 0. So consider the case ld)! > 0, and 
let mt. be a memory state such that mm, 2 pld,). Assume there is an algorithm Qo, 
for the stack operation OP such that performing Qp( p(d,)) does not cause any 
cell in the Magia field to be accessed. Choose k € N such that the n, and then, 
fields are not accessed (e.g., choose k > Id,! +1). Now define a memory state m, 
that differs from m, only in fields Nigheas Migs and nay! 

my = {(n,mo(n))in ¢ D(field n,), n ¢ D(field n,,,), n ¢ D(field Nigiea)} 
U LAK), U Ax) In U (One 
where x, is any element in X and x, € X such that x, # dg(Id,)). Pick d, € ID 
such that p(d,) Cm,. Since App accesses neither the Niglea? the n, field, nor the 
Nyy field, App is not a correct algorithm for either of the stack operations POP or 
TOP, because no such implementation can perform correctly for both dg and dy. 
(This same argument also would include the PUSHx operation.) Thus, any 


algorithm Qo, must access field Nagler I 


We can now prove our lower bound results for the number of memory cell accesses 


oe 


required to perform any POP or PUSHx operation using a JOS endmarker 


representation. 
a 
Theorem 6.1. Let ID = Ux! and consider a function f:XU{ } > gt. Let 
i=0 
pilD = BT be any TOS endmarker representation 


Idl 
= Ufr(a(i))}, u {0}, 
iz] i ja|+1 
Then for all d € ID any implementation of a POP operation requires at least 
ld| +1 memory cell accesses, and any implementation of PUSHx requires 
ld| + 2 accesses; i.e., for all d € ID, 
ALOpopl p(d)) I] > Idl + 1 
ALOpyeuy( P(d))1 2 Idl + 2 


Proof: Any implementation of a PUSHx or a POP operation using g will result in: 
lal 


Altpusndd)) = Utila}, UO, Un, 


ist |d|+1 jd|+2 
la|-1 


A Upgpld)) = Ua), Ore 
Assume there is some algorithm hye for POP. or Pusiix, for which there is some 
p, 1 <p < ldl, such that no cell in n, is in {LOop( p(d,)) J}, for dg € ID. Let my 
be a memory state such that mg 2 io ), and define a memory state m, that 
differs from m, only in field ny 

m, = {(n,m,(n)) In € D(field ny} U (On 
Choose d, € ID such that p(d,) © m,. Since D(0) © U ntis)), the endmarker 
0 is located entirely in the n, field and so Qop eee act distinguish p(d,) from 


p(d). Performing a PUSHx or a POP operation on d, would give: 
p-i 
AUpysy,(4,)) = UlraGi))}, uttd}, ufo}, 
= i p p 
p-2 


Ulrw)},, U (0), 
Thus, we must be able to aiseiGeatce id, | fh ld| in ie for a PUSHx or POP 


Al upop(d ,)) 


operation to necessarily be performed correctly. Since the argument holds for any 


os tae 


p, 1 <p <ldl, we need to access at least ld] cells. By Lemma 6.3, it is also 
necessary to detect the endmarker, leading to one additional access and a lower 
bound of ld] +1 for both POP and PUSHx operations. Notice that for a PUSHx 
operation, it is, in addition, necessary to write 0 in the Nigl+2 field, which gives the 


ld| + 2 lower bound for the PUSHx operation. I 


Whenever f achieves Kraft storage, then D(0) ¢ U D(f(x)), and so we have the 
x€N 


following corollary. 


© 


Corollary 6.1.1. Let ID = Ux! and consider a function f:NUL pg} > Bt. Let 
i= 


pilD = a" be any TOS endmarker representation 
la| 


pld) = UffldG))}, UL}, 
\=4 i ld|+1 
If f achieves Kraft storage, then for all d € ID any implementation of a POP 
operation requires at least |di+1 memory cell accesses, and any 
implementation of PUSHx requires |d| + 2 accesses; i.e., 
HLOpop( p(d))1 > Idi +1 
HLOpysyy,( Pld) )1 > Idi + 2 


We have chosen to require that a TOS endmarker representation have fixed 
position fields and that D(0) ¢ UY ais)), because these seem to be natural 
requirements that are met in wise implementations. As Example 6.3 illustrates, 
however, if we were to eliminate the condition that the fields be in fixed positions, 
then we might Sometimes be able to achieve lower access costs than were specified 


by Theorem 6.1. 


oo 

Example 6.3. Let X = {a,b}, @ = {0,1}, and ID = Ux! Consider the storage 
i=0 

optimal function f:XU{g} > B* defined by 


f(a) =0 
f(b) =10 
f(g) =11. 


BLO e 


Construct from f the concatenation-preserving representation p:ID ~ G* defined by 


le| 
pld) = Uft(aGi))},, UO} gy 
i= 4 i 
where 
i-1 
nd) = 21f(a(j))] 
jet 
and 
Id] 


n(d) = Qi(d(j))h 
Jel 
Then we have, for instance: 


plaaaaaa) = 00000011 
pl(bbbbbb) = 10101010101011 
plaabaab) = 0010001011 


Notice that the leftmost occurrence of 11 indicates the end of the stack. It is not 
necessary, however, to read every element in the stack representation. For instance, 
when m(i) =0 and m(it2) = 0, then there is no need to read m(itl). Thus, we 


could implement POP arid PUSH as follows. 


Opop? i«l 
loop: while m(i) #1 doici+2 
if m(i-l) =Othen ici+l 
goto loop 
ifi=1 then return "Error" 
else m(i-2) © 1 
Qpusna’ icl 
loop: while m(i) #ldoiei+2 
if m(i-l) =Othen ie itl 
goto loop 
m{i-1) «0 
m(itl) «1 
Qpusup’ ict 


loop: while m(i) #1 doiei+2 
if m(i-l) =Othen ic i+] 
goto loop 
m(i) « 0 
m(itl) «1 
m(it2) © 1 


a 


Using these algorithms to perform a POP or a PUSHx operation on p(a™) or on 
ld | 
2 
concatenation-preserving endmarker representation for which it is not always 


p(b™), we only make 


+k, accesses, for some constant k,€IN. So p is a 


necessary to make ld] accesses. Note, however, that for d = {ab}” these algorithms 


lead us to access every cell in D( p({ab}")), a total of 3 - dl + kp accesses. i 


Although in the above example we were sometimes able to perform a POP or a 


; : 3ld] 
PUSHx operation in only Mt accesses, we at other times were forced to make -;, 
accesses. ‘Thus, it seems likely that there would still be an average cost of ld] 


accesses, even though the worst case cost has been improved. If we were to 


, then we would lose storage 


eliminate the requirement that D(0) ¢ U p(f(x)) 
x€X 


optimality but would be able to achieve lower access costs, as Example 6.4 shows. 


Example 6.4. Let X = {a,b}, @ = {0,1,2}, and define the non storage optimal 
function f:X¥U{g} > &* by 


f(a) =0 
f(b) =1 
f(g) = 22 


Let p:ID > @* be the concatenation-preserving endmarker representation, with fixed 


position fields, defined by 
lal 
pd) = Uft(a(i))},., U {22}, 
izd 
For instance, 


plabaab) = 0100122 
p(bbab) = 110122 
pla) = 022 


Possible algorithms to implement POP and PUSHx operations are as follows: 


Opop: i«l 
while m(i) #2 doieci+2 
if m(i-l) #2 then m(i-1) «2 
else if i = 1 then return "Error" 
else m(i-2) « 2 
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i«l 
while m(i) #2 doici+2 
if m(i-l) #2 then mi) < f(x) 
m(it2) © 2 
else m(i-l) © f(x) 
m(itl) < 2 


QeusHs 


For all d € ID, these algorithms have the following access costs: 


HO popl Ald))] idl +k, 
HLOpysy( Ald) ) I = dl + kos 


“ 


fO0 Ka hy GIN: ] 


Theorem 6.1 made no mention of the TOP operation; in fact, the ld| + 1 result 
does riot necessarily hold for every d€ (ID. We can see this by reconsidering 


Example 6.1, which we do in the following example. 


Example 6.5. Recall the representation g from Example 6.1. We presented there an 
algorithm Q,9p which required |d| +1 accesses, for all d € ID. We now show that 
we can sometimes do better than |d] + 1. For instance, consider 
plabbaaba) = 01100102. 

From Theorem 6.1, we know that any algorithms for Qpop and Gpycy, Will access 
at least ld] + 1 memory cells, for all d € ID. Let us construct an algorithm for Ot9;. 
Suppose our algorithm first accesses cell 7. Sirice m(7) = 2, cell 7 must contain the 
endmarker, if cell 7 is part of p(d). By reading m(6), we know that qyop =a if 
T€ D(pld)). Of course, if ld] <7 then it is possible that qzop = b. In order to 
verify that qrop = a we need only access cells m(0), m(2), m(3), m(S), m(6). In 
particular, we don't need to access m(1) or m(4), because we already know that 
m(0) = m(3) = 0, So upon locating the occurrence of the endmarker in cell 7, we 
conjecture that drop = a. If m(1) = 2 or m(4) = 2 then we still have qyop = a. 
Thus, we have an example where it is possible to sometimes determine qyop in 
fewer than |d/ +1 accesses. Notice, however, that an algorithm such as we 


presented here would for some d require more than ld| + 1 accesses; in particular, 
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if the m(7) we originally accessed were not in our representation. i 


In determining Artopl@), the trick used in Example 6.5 could allow us to access, for 


pidi = 1s 49 memory cells, In other words, we would 


some d € iD, as few as 
always access the endmarker field and the field corresponding to the top stack 
element. At best we would only have to access half of the remaining ld| - 1 cells. 


The following theorem shows that it is never possible to do better. 


0 
Theorem 6.2. Let ID = Ux! and consider a function f:XU{%} > Br. Let 
1=0 
p:lID > BT be any TOS endmarker representation 
lal 
pd) = Ufa}, Ufo}, 
i= i ld|+1 


Then for all d € ID such that ld] > 1 and for any implementation, G,o5p, of a 


TOP operation: 


AL Oropl ela))I 21 Whe Ly + 2, 


Proof: By Lemma 6.3, we know that the Nigles field must always be accessed. 
Also, it is necessary to access the Dial field, since this is the value we want to 
determine. So the result clearly holds for ld] = 1 and, by also using Lemma 6.2, for 
ld| = 2. Consider the case where ld] > 2. We know that we must access the ficlds 
Niger and Nygp Now assume we have am algorithm for frop, Arzop, that for some 
d,€ ID returns the value qrop(d,) =x, for some x, € X and for which there 
exists K€ IN, 1 <¢k < Id] -1, such that A,op accesses neither the n, nor the n,.,, 
field, Let m, 2 p(d). Define a new memory state m, such that m, differs from 
my only in the n, field, which contains f(x,) (for x, #x,), and in the n,,; 
field, which contains 4 Then m, 2 p(d,), where 

d(i) fori <k 

Xp fori=k 
Then, using the algorithm Q.5, , we must get Arop(pld,)) =x,. But we know 


that frop(d,) =x >. This results in a contradiction, which means that for any 
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valid algorithm GQ op it is never possible to not access two consecutive fields n, and 


Migs, for 1 <k <Idl - 1. Since by Lemma 6.2 we must always access the n, field, 


ld| - 
a 


this says that we must make at least [ 1 1+ 2 accesses, I 


From Theorem $10 we immediately know that a TOS  endmarker 


representation achieves Kraft storage when the function f does. 


o 


Theorem 6.2. Let ID = UX! and consider the function f:XU{o} > 
i=0 
the function f achieves Kraft storage, then the TOS endmarker representation 


pilD - B*, defined by 


tog 


lal 


p(d) = Uir(a))},, U {0} 
{=1 


also achieves Kraft storage. 


? 
Male 


Before we conclude this section, let us say something about finite memories, 
L <o, In our definition of a TOS endmarker representation we, for simplicity, 
considered infinite domains and assumed that we would never run out of memory 
space. Allowing L to be finite would not have changed our results, except perhaps 
when |p(d)| = L, although our algorithms would, of course, have to be modified. 
Also, recall from Section 5.3 that an erndmarker representation cannot achieve Kraft 
storage for finite L. If we had wanted to allow finite L we perhaps would have 
chosen to extend the definition of a TOS endmarker representation as in the 


following example. 


Example 6.6. Recall Example 6.1, where X = {a,b}, @ = {0,1,2}, and the function 
ftX Ul a} 2 G* is defined by 


f(a) =0 
f(b) =1 
f(g) =2. 
4 
Assume, however, that L=4 and that D=Uyx}4 We could define a 
i=0 


representation :ID > &E* by 
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~s 


f(d(i))},., U {0} for Idi < 3 


f(d(i))},., for ld| = 4 


Ca 
-~ 


Then 
pla) = 2 


plabb) = 0112 
p(babb) = 1011 


Notice that, using this definition, every possible memory state is a representation of 
some stack and p achieves Kraft storage. The stack operations car be implemented 


essentially as they were in Example 6.1, but we have to watch for Id] = L. 


pop: i<0 
while m(i) #2 doifi=L-1 then m(i) « 2 
return 
elseici+l 
if i = 0 then return "Error" 
m(i-l) © 2 
Opysux: ie 9 
while m(i) #2 doifi=L-1 then return "Error" 
elseici+l 
mi) © f(x) 
ifi« L -1 then m(i+l) « 2 
Crop’ i<«0 


while m(i) #2 doifi=L-1 then temp © m(i) 
goto decode 
else temp © m(i) 
reed 
decode: if i =0 then return “Error” 
else if temp = 0 then return "a 
else return "“b" 


These algorithms give the following access costs: 
1 if ld] = 0 
HLOpopl pld))1 =< ldl + 2 ifO< id <b 
ld| if ld| = L 


oe a 


ld] + 2 ifld)/+1<L 


HOpysyy( P(d)) I = < Idi +d ifldl+1l=L 
ld| if ld) =L 
= min{ld] + 2,L} 
ld| +1 if dl << Lb 
HLO.5.( p(d))1 = 
ld if idl =L 
= min{ld] +1, L} | I 


Thus, we certainly could have considered finite memory spaces, but the extra 
complication in our algorithms would not have increased our understanding of TOS 
endmarker representations. Similarly, in the next section we always make the 
assumption that L is infinite. In Section 6.4, where we discuss pointer 
representations for stacks, we shall consider both finite and infinite L. 

In this section we have examined perhaps the most obvious stack endmarker 
representation scheme, the TOS endmarker representation. We know as a 
consequence of Theorem 6.2 that it is possible for such a representation to achieve 
Kraft storage, but we have also shown that any implementation will result in 
expensive access costs for every d € ID. In particular, 

HLOpopl p(d))I 2 ld) + 1 
HOpysyy,( Cd) )1 > Idi + 2 
MO d))1 > FL +a, 
for any algorithms Qpop; Apyeuys Arop implementing the stack operations POP, 
PUSHx, and TOP. his leads us to wonder whether some other type of endmarker 
representation could result in cheaper access costs. The POP and PUSHx operations 
involve updating the memory contents, but the TOP operation is just a question. 
Suppose we were to keep the top of the stack at some fixed location. Such a 


representation scheme is discussed in the next section. 


SilGe. © 


6.3. The BOS Endmarker Representation 


Consider an endmarker representation of a stack in which the top of the stack 
is always at a fixed (known) location and the bottom of the stack is allowed to 
vary. In this case, the endmarker denotes the bottom of the stack. The following 


example illustrates one possible such implementation. 


Example 6.7. Consider the function f from Example 6.1, where we have 


co 

X = {a,b}, @ = {0,1,2}, ID = Ux‘, and where we define the function XU{ a} > G* 
i" i=0 

f(a) =0 

f(b) =1 

f(g) =2=0 
Define the Sanco Veen ie endmarker representation p:ID > gt by 

d 
Ald) = Ulead) )}, (ay U(acay 

where nd) = (ll - i) 
and n(d) = dld| 


In this representation, the endmarker indicates when we have reached the bottom 
of the stack. Reading the memory contents "from left to right" corresponds to 
reading the elements in the stack from the top down. For instance, 


pla) =2 
plabaa) = 00102 
p(baabba) = 0110012 


It is certainly easy to perform a TOP operation, since we need only read m(0). 


Osop! if m(0) = 2 then return "Error" 
else if m(0) =0 then return "a" 
else return "“b" 


On the other hand, consider performing a PUSHb operation on d = ababa: 
p(d) = plababa) = 010102 
AUpysyp(@)) = plababab) = 1010102 


Notice that it will certainly be necessary to access |d| + 2 cells, since this many cells 


= bod 


are actually rewritten. Intuitively, we want to set m(0) © 1 and to shift the contents 
of each cell in p(d) right by one. Recalling the notation introduced in Chapter 3, 
one implementation scheme would have the access sequence 
O,1,2,...léi J, ll, Idi + 1. 
One possible algorithm is the following. 
eu i« 0 
templ © f(x) 


while m(i) #2 do templ s m(i) 
Lee 


PUSH 


m(i) © templ 
m(itl) © 2 


Notice that we have made use of the additional register templ, as we did in 
Example 6.1. Recall also that in Chapter 3 we defined a single access to consist of 
reading and then possibly rewriting a cell. ‘Thus, we have written 

tempi s m(i) 
to indicate a single access to m(i), where the old contents of m(i) is stored in templ 
and the old contents of templ is stored in m(i). We refer to this as an exchange, 
and might have written it out using a second temporary location, temp2: 


temp2 « m(i) 
m(i) © templ 
templ © temp2 


Now consider performing a POP operation on d = ababab: 
pd) = plababab) = 1010102 
PlUpop(d)) = plababa) = 010102 
As for PUSHx, a POP operation will have to rewrite |d] cells and so at least |d| 
accesses will be required. In this case, we intuitively want to shift the contents of 
all of the cells in p(d) left by one. One scheme for doing this would have the 
access sequence 


Gli 25s satyllmby dely aly Waele. sia gdp dy 


and could be implemented using the exchange operation described above. 


- 164 - 


O if m(i) = 0 return “Error” 


POP" 
i¢«l 
while m(i) #2 doici+l 
templ « 2 
whilei >O0 do ici-1 


m(i) s templ I 


We refer to a representation such as p in Example 6.7 as a BOS endmarker 
representation, because the endmarker 0 is always situated in the field following 
that field which contains the bottom stack element; i.e., in the set of cells which the 


bottom stack element would occupy if the stack had another element in it. 


ao 
Definition. Let ID = Ux! and consider a function f:XU{ a} > at. Let 
i=0 
piD > gt be any endmarker representation 
Id| 
d) = f(d(i U 49 
pd) Ute C0) a eee 


where n, € IN, for any i€ N*, and f(g) = 0. If D(O) ¢ U p(f(x)), then 
xEX 


we refer to f as a bottom of stack (BOS) endmarker representation. 


The definition of a BOS endmarker representation is basically the same as that of a 
TOS endmarker representation, except that d(i) is located in field Nigjer-; rather 
than in field n, In other words the order of the representations of the stack 
elements is reversed. The representation is easiest to visualize when nj,, > n, and 
each field consists of contiguous memory cells, but no such requirements are 
imposed by the definition. 

The BOS endmarker representation was motivated by an attempt to decrease 
the access cost for performing a top operation. As we shall see, however, we have 
not altered the access cost for PUSHx and we have actually worsened, for all 
d € ID, the lower bound access cost for POP: 

ALQsop( p(d))] 21 
HOQpycu,( Ad))I > Idi + 2 
MOpop( a(d))I > Fatty 2 SM ya, 


- 165 - 


co 
Theorem 6.3. Let 1D = UX! and consider a function f:XU{g} > Bt. Let 
i= 
p:iD > B" be any BOS endmarker representation 
lal 
pd) = Uft(a(i))}, Ufo}. 
iFt lal+ 1-4 lal+3 


Then for all d € ID any implementation of a PUSHx operation requires at least 
ld| + 2 memory cell accesses; ie., for all d € ID, 


MQ pysuy( PCd))] > lal + 2. 


Proof: Assume there exists some algorithm Que, for performing a PUSHx 
operation and some d € (ID such that ALOsye (pldg))] < ld) +2. By the 


definition of a PUSHx operation we know that 


ld| 
1.) = Ulfld (i U to 
le o) Ut ( OD) ba aaa : baie 
lal 
= U{fld } Cx U 10) 
PlUpyguxl Zo) ) ms ( See F HOD, Male 


Certainly the values in fields Migs aNd Ngizg Must be accessed. Assume that there 
issome p, 1 <p < Idl, such that Apysyy( Pld o)) does not access field n,. Let mm, 
be a memory state such that m, 2 Ady), and as in the proof of Lemma 6.3, let m, 
be a memory state which is identical to m, except in the ny field, where 0 is stored. 
If m, 2 prda)s then the algorithm Qpycy, does not distinguish dy, and d, and 
thus Opucpy, does not correctly perform a PUSHx operation on d,, a contradiction. 
So any algorithm Qpycy, Must always access the ld] fields n,, nzg,..-+ , Nig, well as 


the fields mjaj,, and Ngjea" i 


As a consequence of this theorem, we kriow that the algorithm QousHx in Example 
6.7 is optimal; in fact, we know that for no d € ID is it possible to make fewer than 
ld| + 2 accesses. 

Let us now consider the construction of an algorithm for the POP operation. 
Using the scheme presented in Example 6.7, we could read the n, fields essentially 


from left to right until we reach the bottom-of-stack endmarker, and then shift the 
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elements in the representation “left one field". This corresponds to a field access 
sequence 
1, 2,-.., ldl-1, idl, Wall, ld], lgi-l,..., 2, b 
If we choose not to read all the way to the endmarker and then backtrack, we could 
use an algorithm with a field access sequence 
Dy 2d By eg dy oy cae godly eld, le Ly le 

Either of these algorithms would, however, require making a total of 2d] +1 
accesses, and we shall show that it is possible to (always) do better. In order to 
motivate the lower bound we shall obtain for the POP operation, we indicate how 


the algorithm Q,op in Example 6.7 could be improved. 


Example 6.8. Recall the representation p from Example 6.7 and consider 
performing a POP operation on d, = abaa. We know that 

P(d,) = 00102 
and AlUpopl pldy)) = 0102. 
Recall that our definition of access allows us to read and then, if we choose, rewrite 
a cell. So suppose we first access cell 1. Since m(1) = 0, we put a 0 into cell 0, 
checking, of course, that cell 0 is not the end of the stack. We then read cell 3. 
Since m(3) = 0 # 2, we go back to cell 2, which we now read. Since m(2) «0, we 
write a 0) into cell 2, We already know that m(1) #1, and so we set m(1) © m(2). 
At this point we have (correctly) rewritten m(0), m(1), m(2). We now read cell 5. 
For the case we are considering, m(S) is not included in p(d4), so cell 5 might or 
might not contain the endmarker 2. In either case, we back up and read cell 4, at 
which time we find that m(4) = 2, Having already read cells 0, 1, 2, 3, we now 
know that cell 4 contains the BOS endmarker. So we set m(3) « 2 and are done. 
Using this procedure we have the memory cell access sequence 

1,0, 3, 2,1, 5, 4,3, 7, 6 5, 9, 8 7, LL, 10, 9,... 

We might write the algorithm out as follows, making use of two temporary 


locations, templ and temp2. 
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QO templ ¢ m(1) 
if m(0) =2 then return "Error" 
else m(0) © temp] 


if templ = 2 then return 


POP’ 


jie 3 
while m(i) #2 do templ « m(i) 

temp2 ¢ m(i-l) 

m(i-2) © temp2 

if temp2 = 2 then return 

else m(i-l) « templ 
ic ita 
temp2 © m(i-1) 
m(i-2) © temp2 
if temp2 = 2 then return 
else m(i-1) « 2 


This algorithm results in an access cost of 
3. +2 for Id] even 
HLOQpopl p(d))] = 
3 dob +2 for Idi odd. 


We shall shortly prove that the algorithm is, in fact, optimal. I 


In order to derive a lower bound access cost result for performing a POP 
operation we begin by proving two lemmas. Recall that our definition of access 
allows us to read, although certainly not rewrite, a memory cell which is being used 


by another user. 


oe 
Lemma 6.4. Let ID = UX! and consider a function f:XU{ ge} at. Let 
i= 
pilD = BT be any BOS endmarker representation 
lal 
(a) = U{r(a(i))} Ufo ; 
Pp ui Dated t gil 


Then a cell in field nj, i 21, cannot be rewritten unless each of the fields 


Nay++-, Ny, has been accessed. 


- 168 - 


Proof: A cell in field n, cannot be rewritten unless it is known that ld] >i - 1; ie., 
we are not allowed to rewrite field n, if it is in some other user's memory space. 
Thus, in order to rewrite a cell in field n,, it must be the case that no Nyy for 
1 <j <4, contains the endmarker 6. There is no way to guarantee this without 


accessing each of the fields ny, Np, ++ My} I 


The following lemma essentially tells us that field n, cannot be rewritten until field 


0,44, has been accessed. 


foe] 
Lemma 6.5. Let ID = Ux! and consider a function f:XU{g} > at. Let 
i=0 
piD - gt be any BOS endmarker representation 
ld| 
pd) = Ufe(a(i))}, i) ae 
ist |dj+1-i la|+4 


Consider any algorithm, Qpop, for the operation POP. Then there must be 
an access to field n,,, made previous to the last rewrite of field n,, for 


1 <i < dl. 


Proof: Recalling the definitions of the BOS endmarker and the POP operation, 


lal 
Ad) = Uft(a(i))}, U {0}, 
14 lal+ 1-1 ld|+1 
ld|-1 
and PlUpopld)) = U {r(a@Gi))}, U{O}, 
i=4 la|-i lal 


So f(d(i)) gets moved from field nj4j,1.; to field nj, Since we can determine the 
contents of field Nglerei only by making at least one access to that field, field 


Nygje1-; Must be read before its value can be put into field Night i 


For any algorithm Q,op we can consider its corresponding field access sequence. 
We prove our lower bound result by lower bounding the size of a sequence which 
meets the conditions presented in lemmas 6.4 and 6.5. We first make the following 


definition. 
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Definition. For k, i € N*, define a set Si) aS follows: 
Sy @ {ky kth, -. 0) kti-l, kei, ky kel) ey keb-L} 
We say that a sequerice is an 5(k,i)-seguence if it contains each of the terms 
k, k+l, ..., kti-l, k+i, if each term inh the sequence is in Siu and if the 
following conditions are satisfied: 
(i) For allr, k <r <k +i, the last occurrence of r is preceded by r + 1 or 
eek, 
(ii) For all r, k <r <k +4, the last occurrence of r is preceded by Jj or ], 
for every element j € {k, ktl,..., r-2, r-L}. 
We define o(k,i) to be an s(k,i)-sequence of minimal length, so that 
lo(k,i)1 & min Is(kyh 
s(k,i) 
Since |o(0,ld1)| is minimal over all sequences s(0,ld¢i), o(0,ld|) corresponds to an 


optimal access order for performing a POP operation. 


ow 
Lemma 6.6. Let ID = Ux! and consider a function f:XU{ a} > B. Let 
1=0 
pilD > Br be any BOS endmarker representation 
lal 
d) = Uft(a(i))) ufo : 
pd) ui (a( Mss 4 f died 
Then for any algorithm, Qpo5,, which implements the operation POP, and for 


alld € ID: 
HLA pop( pld))1 > lo (0,ldI) |. 


Proof: Recalling the definition of a field access sequence, the proof follows directly 


from lemmas 6.4 and 6.5 and from the definition of o(0,ld|). I 


Now that we have established the corresponderice between a sequence o(0,|d/]) 
and H[Qpop( p(d))], we have the notation with which we prove our lower bound 


result. We prove this as a consequence of three lemmas. 
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Lemma 6.7. Fork €IN,k >0 (a) lo(k,0)| =1 
(b) lo(k,1)l = 2 


Proof: (a) We want the minimal length of a sequence o(k,0) containing k and 
satisfying conditions (i) and (ii) in the definition of 7; k is such a sequence and 
therefore |o(k,0){ = 1. | 

(b) We want the minimal length of a sequence o(k,1) containing k, k+l, and 
satisfying (i) and (ii); k+l, k is such a sequence and clearly must be minimal. 


Thus, lo(k,1)| = 2. 5 


Lemma 6.8. Fori€IN,i2>0, © |o(0,i+2)| = 3 + Io (0,i)l. 


Proof: o@(0,i+2) is a minimal length sequence containing 0, 1,..., i, itl, i+2, and 
o(2,i+2) is a minimal length sequence containing 2, 3,..., i, itl, it+2 (both 
satisfying the above conditions (i) and (ii)). 
(i) We first show that lo(0,i+2)1 < 3 + lo(0,i) |. 
Suppose we have some minimal length sequence o(2,i+2). We convert this into a 
sequence o(0,i+2) by considering two cases: 

(a) Assume that 2 is preceded by 2 in the sequence o(2,i+2). Immediately 

following 2, insert 0, 1, 0 into the sequence. 
(b) Assume that 2 is not preceded by 2 in the sequence. (Then it must be the 


case that 2 is the first field written.) Before 2, insert 1, 0 and after 2 insert 


L 
Thus, lo(0,i+2)| < 3 + lo (2,i+2)] = 3 + Io (0,i) 1. 
(ii) We now show that lo(0,i+2)| > 3 + lo(0,i) |. 
A minimal sequence ¢(0,i+2) must contain 0, 1, and a minimal sequence o(2,i+2) 
will riot contain these. But 1 must appear before 0, and 0 (as well as 2) must 


appear before 1. Therefore, it is necessary to include 0 or 1 in order to have Q, 1. 


Thus, lo (0,i+2)| > 3 + lo (0,i) 1. I 


- lil - 
Using the previous two lemmas, we can compute | (0,i) |. 


Lemma 6.9. For i € IN, i > 0, we have: 
(a) |o(0,2i)] = 31 +1 
(b) |o(0,2i+1)] = 3i + 2. 


Proof: From Lemma 6.8, lo(0,i+2)| = 3+ le(0,i)|, Now apply Lemma 6.7. For 1 
even, this gives us 

lo(0,i+2)| = 3i + lo(0,0)| = 31 + 1, 
and for i odd we have 


lo (0,i+2)| = 3i + Io(0,1)1 = Si + 2. I 


We now apply this discussion of sequences @ and recall from Lemma 6.6 the 
correspondence to the POP operation. ‘This now allows us to lower bound the 


number of accesses required to perform a POP operation. 


ao 
Theorem 6.4. Let ID = Ux! and consider a function f:XU{o} > OT. Let 
1=0 
p:ID = gt be any BOS endmarker representation 
ld| 
ald) = Uft(a(i))},, OC) Se 
is 4 la|+1-1 ld|+1 


Let Qpop be any implementation of the POP operation. Then for all d € ID: 
g Wel sa if Wdlis odd 
#lOpopl pld))I 2 
3 i +1 if ld] is even 
In other words, 


#00. (ais pee 


POP 


Theorem 6.5 combines the results of theorem 6.3 and 6.4, along with the trivial 


observation that ALA,,.( p(d))IJ 2 1. 
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oO 

Theorem 6.5. Let ID = Ux! and consider a function f:XU{ 2} > gt. Let 
i=0 

p:ID > Bt be any BOS endmarker representation 


lal 
p(d) = Uff(a(i))}, U {0}, 
i=4 la|+1-i 
Let GQp5p be any implementation of the POP operation, let Qpycy,, be any 
implementation of the PUSHx operation, and let G,>p be any implementation 
of the TOP operation. Then for all d € ID: 
MOpopl Add] > 6 Sdlehy 
ALApysuy( A(d))] 2 ldl + 2 
HOropl pld))] > 1. 


Recalling the algorithm for POP that we presented in Example 6.8, we now 
know that that algorithm is optimal for ld| odd. Perhaps it would be possible to do 
one access better, however, when ld| is even. As a consequence of the following 
lemma, it is impossible to simultaneously achieve the bounds of Theorem 6.4 for 


both Id] odd and ld] even. 


Lemma 6.10. Let i be any even natural number. Suppose we have some 
minimal length sequence a (0,i) and some minimal length sequence 


7 ,(0,i+1). Then o,(0,i) is not a prefix of o ,(0,i+1). 


Proof: The sequence ¢,(0,i) must contain 0, 1,... ,icl, i, and o,(0,i+1) must 
contain 0, 1,... ,i-l, i, itl. Since i is even, lo,(0,i)| = 3 s +1. Because i + 1 is 
odd and o ,(0,i+1) also has minimal length, lo ,(0,i+1)| = 3 > +2. Thus, 

lo ,(0,i+1)1 = lo (0,i)1 + 1. 
Suppose o,(0,i) is a prefix of o ,(0,i+1). Since l¢,(0,i)| is minimal, o,(0,i) does 
not contain i+1, and therefore also does not contain i, both of which must be 
present in o ,(0,i+1). So there is no way to append a sequence to 7,(0,i) in order 


to obtain a minimal length sequerice g ,(0,i+1). J 
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Note that the proof of Lemma 6.10 does not hold if i is an odd number, because 
lo (0,141) | = 2 + lo ((0,i) 1 for i odd. 

Theorem 6.58 gave lower bounds for implementing the stack operations with a 
BOS endmarker representation. Example 6.7 showed that the bounds for the 
PUSHs and TOP operations are actually achievable. We can also argue, as a 
consequence of Lemma 6.10, that the algorithm G,op from Example 6.8 is access 
optimal Since (hoop has a muitimal number of accesses for ld| odd, it cannot 
passibly achieve 3 sath + 1 accesses for ld] even. Thus, the best it could possibly do 


“a 
would be 3 . ll + 2 accesses for ld| even, which is precisely what it does do, The 


following example shows that we could have constructed an algorithm for the POP 


operation which would have been minimal for ld] even. 


Example 6.9. Reconsider the representation p from examples 6.7 and 6.8. The 
algorithm Gsop from Example 6.8 is access optimal. Let Qpop’ be an algorithm for 
the POP operation which has the field access sequence 

0, 2, 1,0, 4, 3, 2, 6, 5, 4, 8, 7, 6,... 


Note that Gpop” is, in fact, realizable, because this is basically the same algorithm 


POP 
we had before, only with a different starting sequence. This algorithm has for an 


access Cost: 


3 dl +1 if ld| is even 
HLOpop’(p(d)) I 2 
3. ddbl yg (Pidhieede 
a 


Thus, Qpop” requires a minimal number of accesses for ld| even and is also access 


POP 
optimal. In fact, for Example 6.1, the BOS endmarker representation p with TOP 
and PUSIIx implemented as in 6.1 and the POP implemented as in Example 6.8 is a 
storage and access optimal implementation ( p, Qpops Opusiy? Cees 

As was the case for the TOS endmarker representation, Theorem 5.10 


immediately tells us that a BOS endmarker representation achieves Kraft storage 


when the function f does. 
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Lo] 


Theorem 6.6. Let ID = Ux! and consider the function £:XU{o} > gt. if 
i=0 
the function f achieves Kraft storage, then the BOS endmarker representation 
p:lID > gt, defined by 
lal 


pd) = Ula}, U {0} 
=1 


also achieves Kraft storage. 


? 
Mal+4 


So we have constructed the BOS endmarker as an alternative to the TOS 
endmarker representation scheme. We thereby decreased the access cost for 
performing a TOP operation, but in so doing we increased the cost of a POP 
operation. For a summary of the worst case lower bounds, see Table 6.3 at the end 


of the chapter. 


Sg 
6.4 The TOS Pointer Representation 


Consider using an endmarker representation to implement a stack and the 
POP, PUSH, and TOP operations. The following example illustrates one possible 


such implementation. 


3 
Example 6.10, Let X = {a,b,c,d}, 8 = {0,1,2,3}, and ID = UX" Let the function 
i=0 


f:X 2 G™ be defined by 


f(a) =0 
f(b) =1 
f(c) =2 
f(d) = 


Define the concatenation -preserving pointer representation pilD xd gt by 
‘dj 
' 


p(d) = Ufs(a(i))}, u fala) },, 
ini 


where the pointer component £:J > 8° is defined by 


o(0) =0 
CCl) = 1 
(2) = 
a3) =3 
For instance, 
pla) =0 
f(d) = 13 


We assume that L is large enough to represent any d € ID; in particular L > 4. In 
order to perform a POP operation in this example we need only decrement the 
pointer. Notice that there is no need to read any stack elements, since decrementing 
the pointer automatically decreases |p(d)| by one. So we could use the following 
sunple algorithin to perform a POP operation. 


pop: ‘if m(0) =0 then return "Error" 


. m(0) © m(0) -1 
By our definition of a memory cell access, this algorithm for POP corresponds to a 


single access; we read the contents of cell 0 and then, depending on its contents, we 
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may rewrite the value. If m(0) =0, then we return an "Error" message and the 
second line in the algorithm never gets executed. For a PUSHx or a TOP 
operation, however, we must read the pointer in order to determine where the top 
of the stack is, and we then go to the appropriate stack location to perform the 
operation. 


QoeusHx’ if m(0) = 3 then return "Error" 
m(0) © m(0) +1 
m(m(0)) © f(x) 


Gas if m(0) =0 then return “Error’ 
return m(m(0)) 
These algorithms give the following access costs: 


HLOpop( p(d))] = 1 


2 if Id| «3 
HLOpysyy,( Ald) ) I = 
1 else 
2 if Idi| #0 
ALOrop( p(d))] = 
1 else I 


Notice that the representation p in the above example allowed us to implement the 
3 


set of stack states ID = Ux! with tow update costs, lower thari was possible with 
1=0 
the TOS or BOS endmarker representations. 
We extend the pointer scheme illustrated in Example 6.10 and make the 


following definition. 


k 
Definition. Let ID = Ux!) for k € N and consider a function f:X > Bt. Let 

iz 
p:ID > Bt be any fixed position field pointer representation 


lal 


ald) = Ufr(ali))}, u {eld}, 
i=1 
where n,n, €N (for any 0¢i¢k) and where & is a representation 


ajo at. Then pis a TOS pointer representation. 


apts 


We use the term TOS because reading the pointer component, &(|d]), tells us ld 
and we can then go directly to the field ryq, in order to determine the top of stack 


element. Clearly the representation p from Example 6.10 is a TOS pointer 
fo) 


representation. If iD] is large, and especially if ID = Ux! then the size of the 
pointer will grow large. Therefore, we may sSintueies ind it convenient to view 
the TOS pointer representation as a separate pointer representation. 

Restricting our consideration to concatenation-preserving representations 1s 
perhaps an obvious thing to do, but let us discuss why we also require that a TOS 
pointer representation have fixed position fields. The fixed position field 
assumption is included as a consequence of our definition of a pointer 
representation, where we chose to encode ld] rather that |p(d)|. If we were to allow 
variable position fields, then knowing @(ld|) would not necessarily tell us the 
location of the top of the stack. 

Unfortunately, requiring fixed position fields will, in general, result in "gaps" 
in the representation, unless f(x ,)| = If(x,)| for all x, x2 € X. Thus, if we insist 
on Kraft storage, a TOS pointer representation must sometimes have gaps when 
IX] # II. We could, alternatively, have defined a TOS pointer representation p to 
bea SORCEr Ea otebeseeNie  enceeneayol pilD € B* defined by 

a| 
p(d) = UE}, ca) U {A pld) 1) $5 


i-1 
where n= leClpCa) i + QieCa( j))k 
Such a definition would avoid the problem ar Panik gaps in the storage of p(d) 
and would not affect the storage and access results we obtain. Thus, our original 
definition of a TOS pointer representation is satisfactory for our purposes. 
In Example 6.10, the domain size was small enough that the pointer 
component was able to fit in a single memory cell. For a larger but bounded 


domain size, we can still store a stack pointer in a fixed number of memory cells, as 


we do in the following example. 


SJ. 


7 
Example 6.11. Let X = {a,b,c}, 3 = {0,1}, and ID = UX! Define the function 
1=0 


f:X¥ > B* by 


f(a) =0 
f(b) =10 
f(c) =11 
and the pointer component @:J > G* by 
(0) = 000 0(4) = 100 
(1) = 001 0(S) = 101 
(2) = 010 &(6) = 110 
(3) = 011 (7) = 111. 
Then we can define the TOS pointer representation p:ID > G* by 
lal 


Ad) = Utd) dageryeg U (eC) 3g. 
1-1 
(We assume, of course, that L > 17.) Then we have, for instance, 


pla) = 000 
pCabc) = 01101011 
p(accab) = 1010 11110 10 


Notice that the representation p achieves Kraft storage. 

We can implement the stack operations roughly as follows. For the TOP 
operation, we read the three pointer cells and then go to the top of the stack to look 
up the answer. 


a templ © m(2) + 2+ m(1) + 4+ m(0) 

if templ = 0 then return "Error" 

temp2 © 2- (templ-1) +3 

if m(temp2) = 0 then return "a" 
else if m(temp2+l) = 0 then return "b" 


else return ¢ 


TOP’ 


In order to do a PUSHx operation, we must increment the three pointer cells as we 


. read them, and we then insert the correct item onto the stack. 
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Ceci! if m(2) =Othen m(2)e¢1 
goto write 
m(2) «0 
if m(1) =Othen m(1) «1 
goto write 
m(1) «0 
if m(0) =Othen m(0) <1 
goto write 
m(2) «1 
m(1) «1 


return "Error" 
write: templ « 2+ (a(2) + 2+ m{1) + 4-m(0)) +1 
if x =a then m(templ) « 0 
else m(templ) « 1 
_ifx =b then m{templ) +1+<0 
else m(templ+1) «1 


For the POP operation, we need only decrement the pointer. Unfortunately, this 
may require accessing some pointer memory cell more than once. The following 


simple algorithm is one possibility. 


Qpop: if m(2) =1 then m(2) <0 
return 
m(2) <1 
if m(1) =1 then m(1) «0 
return 
ml) ¢1 
if m(O) =1 then m(0) <0 
return 
m(1) <0 
m(2) «0 


return "Error" 
Notice that this algorithm causes us to incorrectly change m(1) and m(2) in the 
case where an "Error" coridition is to be returned, thus forcing us to go back and 
rewrite these cells. 
Excluding the cases where we get an Error condition, these three algorithms 


give us the following access costs, for all d € ID: 


= 160 


52 AlOrop(p(d))1 24 
52 ALApysry( pld))] > 4 
52> HOgop(pld))] 24 l 


The strategy used in Example 6.11 for implementing the stack could be used 
with any TOS pointer representation which has a fixed size pointer field. 


k 


Definition. Let D = Ux', tet r = Fog, tk +1)1, and let f be a function 
1=0 


f:X > gt. Suppose the pointer component £ is a one-to-one function 
0:{0,1,...,k} 2 6 Then the TOS pointer representation p:ID > gt defined 


by 
ld} 


Ad) = Ufr(ali))} 
j=1 


is said to be a TOS pointer representation with a fixed size pointer field. 


U {eCldl) 363 


+r 
my 


The TOS pointer representations in both examples 6.10 and 6.11 have fixed size 
pointer fields, and we implemented the stack operations in essentially the same way, 


first reading the pointer and then, if necessary, accessing the list component. 


k 
Theorem 6.7, Let ID = Ux! and let r = Pog tk +1)1. Let f be a function 
i=0 B 
f:X > @* such that maxlif(x)] =t. Consider the TOS pointer representation 
x€X 
pID > &*, with a fixed size pointer field, defined by 
la| 


pd) = Uted i decgeayes U {.o( ld\) do; 
isd 


where @:J > Q’. Then it is possible to define the representation & in such a 
way that the stack operations can be implemented with algorithms which have 


the following access costs. For all d € ID, 


r+t 2 #lQrop(p(d))] 2r+1 
r+t > AQpyey (eld) )] 2r+1 
dr-1 2 AlQpop(p(d))1 21 


a os) Ne 


Proof: The construction of algorithms for the TOP, PUSHx, and POP operations 
is the same as in Example 6.11, and we shall not present all of the details here. We 
define €(i) € G" so that when the string €(i) is viewed as a number it is the base 
IZl representation of i (with preceding 0's, if necessary, since 1@(i)] =r). 

We construct O,o, so that it reads the r memory cells in the pointer and then 
goes to field nm), to read the top stack element. Thus, Qyop accesses at least r + 1 
and at most r +t memory cells, depending on the size of the representation of the 
element at the top of the stack. | 

Now consider implementing an Qpysy, algorithm. By the way we have 
defined the pointer component @, it is possible to increment the pointer as we read 
it, if we access cells in the order m(r-l), m(r-2),..., 1, 0. (See Example 6.11 for 
an illustration.) After reading the r pointer cells, we locate the appropriate field 
and write f(x), a total of r + If(x)| accesses. 

For the Qpop algorithm we need only decrement the pointer. So it would 
never be necessary to make more than 2-r - 1 accesses, because we could just read 
the pointer in one pass and rewrite it in the next. On the lower bound side, we 


clearly need to make at least one access. i] 


Notice that, using the method from Example 6.11, the 24+ - 1 upper bound on the 
number of accesses for the POP operation would be attained only when |d| = 0 and 
an "Error" message is returned. For d # 4, r would be an upper bound and we 
frequently would be able to do even better. 

In the proof of Theorem 6.7, the only reference to the particular pointer & we 
chose was in obtaining the upper bound for the cost of. performing a PUSHx 
operation. As we argued there for the POP operation, it would always be possible 
to increment the pointer by making 2-r-1 accesses. This gives us the following 


corollary. 


- 182 - 


k 
Corollary 6.7.1. Let ID = UX! and tet r = Mog, bk +1)1, Let f be a 
1=0 bj 
function f:X 2 G* such that maxlif(x)]=t. Consider the TOS pointer 
x€X 
representation j:ID > 6* defined by 


ial 
Ald) = UFC) dag gyep U (OCD 3 
i= 


Then for any one-to-one pointer function £:J > 8’, it is possible to implemenr 


the stack operations so as to obtain the following access costs. For all d € ID, 


r+t 2 ALAsgp(p(d))] Brel 
Qr-L+t > AlApysy,(p(d))] 2r+1 
dr-1 > AlOpopl p(d))] 21 


Theorem 6.7 and Corollary 6.7.1 gave us upper bounds on access costs for 
performing POP, PUSHx, and TOP operations using a TOS pointer representation 
with fixed position fields. ‘he bounds depend on r, not on ldl, although the size 
of r itself is dependent on maxld|: r =flog (maxld|+1)1. Thus, when Id] is 

déiD Bl déID 
small, being forced to read r cells could be relatively expensive (eg., when r is 
large and the stacks we are representing are small). Consider, however, where 
these bounds came from. We can rewrite the result of Corollary 6.7.1. For any TOS 
pointer representation with a fixed size pointer field, we can implement the stack 
operations with the following access costs: 
ALQtop( p(d))1 << 1e(ldI)1 + If(qrop(d))| 
MOpysiiyd CA)II < Lela) + I6Cx) 


21@(ldi)i - 1 for ld| = 0 
ALO popl p(d))] < 

[eC lal) | for ld| # 0 
" assuming that the function f is a representation and achieves Kraft storage. 


Let us now extend these results to TOS pointer representations where we do 
k 


not have fixed size pointer fields. We would also like to allow ID = Ux', where 
izO 
k <0, From the above discussion it should be easy to see that the following 
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theorem holds. 


k 
Theorem 6.8. Let ID = Ux', where k < 00, and let f:X¥ > G* achieve Kraft 
_ iF0 
storage. Consider the TOS pointer representation g:ID > Bt defined by 
lal 
ald) = Uft(a(i))}, u (eda) }., 
124 i 


where ¢ is any representation €:J > gt, Then it is possible to implement the 
stack operations so as to achieve the following access costs. For alld € ID 
Orel Pld) < 1eCluI)! + Klazopld) | 
HOpuysuyl( Ad) )I < ale(ldl)i + i(x)1- 1 
ALOsopl p(d)) 1 < 2-1e(ldl))1 - 1 


Proof: For any TOS pointer representation, reading the pointer immediately tells 
us the location of the top of the stack. So we can certainly perform a TOP 
operation, by accessing each pointer cell and then reading enough cells in the list 
component for us to distinguish qrop(d). Since f achieves Kraft storage, this is 
precisely [€(ldI)1 + If(qzgp(d)) 1. For the PUSHx and POP operations, it is, in 
general, necessary to rewrite the pointer, which at worst would require 2-/@(|d1)| - 1 
accesses: one pass over O(|d|) to read and the next to rewrite. For a POP, we 
need not access the list component at all, and for a PUSIIx, we need to write f(x) 
into memory. I 
From Theorem 6.8, the issue is now to see how compact we can make our 
pointer component €(\d]). Recalling the construction of the class of pointers @ _ 
from Section 5.4 (see Table 5.2), we have a possible representation scheme, with 
|&(|d\)1 = O(log tdi). 
Consider using this scheme to perform a PUSHIx or a POP. Since each pointer is a 
representation of a natural number n, we want to be able to increment or decrement 
by one the number to be represented. For the scheme in Section 5.4, this means we 


always need to alter the "rightmost" cell in the pointer representation. Since the size 
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of the pointer component is not fixed (in fact, it may be unbounded), there is no 
way to know where this rightmost cell is, unless we read (most of) @(ld]). Even 
then, we might be forced to backtrack. We shall construct a new pointer scheme, 
with the same storage cost aS our previous one, but for which it will be easier to 
increment and decrement the pointer. This makes it, on the average, cheaper to 
perform a POP operation. 
Recall the pointer representation scheme @ : as illustrated in Table $.1: 
@X(n) = 0PM. 1. a(n), | 

where we write h(n) for h,(n). For instance, consider 

 3(18) = 000010011. 
In order to perform a POP operation on a stack of length 18, we need to decrement 
the pointer, leaving us 

@ 3(17) = 000010010. 
Notice that we needed to alter only the last bit in the pointer, but there is no way 
to locate this last bit without reading the entire pointer. If we could rearrange bits 
so that we read the last bit (of h(n)) early, then whenever n is even we would 
just change the appropriate bit to 1 and immediately be finished with our POP 
operation. We can do this by interspersing the bits of 2 3(n) from the oih(™l 
component with those from the h(n) component (using an extra 1 to denote the 
end of the pointer representation). Note that these two components each have the 
same number of bits. Since we would like to be able to read the last bit of h(n) as 
early as possible, we reverse the bit order of h(n). Such a strategy gives us 

A 3(18) = 010100001 

A 3(17) = 000100001. 
For clarity, we have underlined the bits that come from the h(n) component. Some 
additional values of A 3 are given in Table 6.1. 

. 6 We now give a formal definition of the pointer representation scheme A ee 

We begin with the following preliminary definition, based on the definition of the 


string Him) from Section 5.4. 
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Definition. For Gi 2 2, we define the string 


Z 
6 2G: Gis 
lal 


IB\ IG| 
ie., the reverse of the string h tn). For l <i< 10. |, we write @ , (i) to 
5 


i 
denote the i” component of the string @ n« For notational simplicity we may 
simply write 8, to stand for 6 z 


I3| ial 
So 6 by (1) is the last character in the string hts @ ,, (2) is the next to the 
G 


last character in the string hy ons etc, 


Example 6.12. Since h,(18) = 0011. Then @,, = 1100, and we have @,,(1) = 1, 
@,,(2) =1, @,,(3) =0, 0,,(4) =0, i 


We now define the pointer representation A - in terms of the string 0. 


Definition. Let |G] > 2. We define the pointer representation scheme A : as 


\0,| 
failowss A in) & U {0- Oi) }aceyy U Dog i 


We illustrate the definition with an example. 


Example 6.13. Let us determine the pointer A 3(26). Recall from Section 5.4 that 
h,(26) = 1011. So @,(26) = 1101, and 
A 3(26) = {0-1}, U {0-1}, U {0-0}, U {0 1}, U {1}, 
= 010100011. i 


Table 6.1 gives the pointer representations A a(n) for 0 on < 33 
Now that we have defined the pointer representation scheme A 2 let us use 


this scheme and determine access costs for implementing the stack operations. 


wwmaIAW S& WwW AF © 


is) 


69 t3 8D bO DD KN 
Om — W £3 
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n(n) é,, A Un) 

3 " 1 

0 0 001 

1 1 011 

00 00 00003 

01 10 01001 

10 01 00011 

ll ll 0101) 
000 000 0000001 
001 100 9100001 
010 010 0001001 
O11 110 0191001 
100 001 0000011 
101 tol 0100011 
110 O11 0101001 
ise lll 0101011 
0000 0600 000000001 
0001 1000 010000001 
0010 0100 000100001 
OOlL 1100 010100001 
0100 0010 (00005001 
0101 1010 010001001 
0110 0110 000101001 
0111 1110 010101001 
1000 0001 000000011 
1001 1001 (10900011 
1010 0101 000100011 
101 1101 010100011 
1100 0011 000001011 
1101 1011 030001011 
1110 0111 000101011 
Lt iver 010101011 
00000 00000 00000000001 
00001 10000 01000000001 
00010 01000 00010000001 


Table 6.1. Construction of pointer representation A - 
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co 
Theorem 69. Let D = Uyx', tet @= {0,1}, and consider a function 
i=0 


fr¥ > Bt Let pID > BT be a TOS pointer representation 
ld 


pd) = Ufr(a)}, UTA 2d} 
where n, n,€N. Let k, éN, Then it is possible to implement the stack 
operations so as to achieve the following access costs for all d € ID. 
HOzop( A(d))] < IA Z(idi)1 +k, 
HOpyony( PCd))I < 1A SCI) + iC) + 2 
HOpopl pld))] < IA B(ldi)l +1 


Proof: As we have previously seen, it is certainly possible to implement the TOP 
operation by reading the entire pointer and then going to the appropriate location 
to look up the answer Gegerd)s Although a lookup of this answer might require 
making more than If(q4gp(d))! accesses, it cannot take more than some constant 
number of accesses, depending on details of the function f. 

We have constructed the representation scheme A : so that it will be easy to 


decrement the stack pointer. Consider the following algorithm: 


O56 if m(0) = 1 then return "Error" 
ie] 
loop: if m(i) =1 then mi) <0 
return 
mi) el 
if m(itl) = 1 then m(i-1) «1 
return 
ieit+2 
goto loop 


In this algorithm, we read the pointer from left to right and never backtrack over 
more than one cell, This gives the desired bound for POP. 
A similar scheme allows us to perform a PUSHx operation. 
Ousves if m(0) =1 then m(0) « 0 
m(1) «0 


m2) e1 
return 
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iel 
loop: if m(i) =O then m(i) «1 
return 
m(i) « 0 


if m(itl) =1 then m(i+l) «0 
m(i+2) © 0 
m(i+3) © 1 
return 
i © i+2 
goto loop 
In this case we read the entire pointer and, although we never need to backtrack, 
we sometimes need to rewrite two additional cells. Having incremented the pointer, 


we can insert f(x) in the appropriate field with If(x)| accesses. I 


We can see that we have improved our previous access costs, so that each 
stack operation can be implemented with at most O(logld|) accesses in the worst 
case. In fact, the next result shows that we could expect to do even better for a 
POP operation because for a very reasonable probability distribution we can expect 


to make, on the average, only a constant number of accesses. 


oo 
Theorem 6.10. Let ID = Ux!) tet g = {0,1}, and consider a function 
i=0 


fix 9 BT Let pID > BT be a TOS pointer representation 
lal 


pd) = Ufr(a(i))}, U {A aD}, 
j=1 1 
where n, n,€N. Assume that there is a monotonically nonincreasing 


probability distribution P on the stack states: 
P(ld| =n +1) < P(ldl =n). 
Then it is possible to implement the POP operation so that 
avgHQpopl pAld))1 <k, 


for some k € IN, 


2169 = 


Proof: Consider the algorithm Qpop presented in the proof of Theorem 6.9. Note 
that 2 accesses are required for idl = 2,4,6,8,10,..., that 4 accesses are required 
for ld] = 5,9,13,17,..., that 6 accesses are required for ld| = 11,19,27,35,..., ete. 
Denote P(ldi = i) by p, Since p,,; $ Py» We Know that 


Pot Pat Pg tPgt+++SPy*+Pgt Pe tPot--- 


and so Pot Pg tet Pa tee ae 
Similarly, Ps *PgtPygt Pypt--- § 4, 
Pie Rig. Dae "Pas aes <4, 
Po3 + Pag + P55 * Prit--+ S 6 etc 


Notice that extra work is required to perform the POP whenever Id| = 1, Idl = 3, 


ld] = 7, ldl = 15, ete. (ie, when Idi = 2) - 1 for somei€ IN). Thus, 


co 


Tp, Appl lla =1))I<S DA + Dp. Weel) 
i=0 


{l 
fay 
Oo 

— 


The following theorem summarizes the results we have just derived. 


Theorem 611. Let ID = Lh let G= {0,1}, and consider a function 
i=O 
f:X 2 Bt. Let piD > Bt bea TOS pointer representation 
Id] 
p(d) = Ui), U{A }d))}, 
i=i 


where n, n, € N. Assume that there is a monotonically nonincreasing 
probability distribution P on the stack states: 

P(ld] =n+1) < P(ldl =n). 
Let k,,k,€IN. Then A : achieves Kraft storage, and it is possible to 


implement the stack operations so as to achieve the following access costs: 
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ALO opl pAld))I g 2: Llog ,( ld|+1) J + ke 
HOpyshy( Pld) )I] < 2 log (Idi+1)J + If(x)] + 3 
HOpop( pld))] < ky 


Proof: ‘The result for the POP operation is the result of Theorem 6.10. We obtain 
the inequalities for TOP and PUSHx by recalling that IA 2(n)| = |e 3(n)| and by 
making use of Lemma 5.7 and theorems 5.18 and 6.9. The Kraft storage follows 


from Theorem 5,19. | 


We have chosen to prove these results for the pointer scheme A oo but the 


scheme can be extended to include A‘... As it turns out, the access costs we obtain 


are even better than for A . although the results are all of the same order of 
growth. Because the details would tend to obscure an understanding of the class of 
pointer schemes A, we shall not formally define A a for IG@l >2 ori >. Burt let 
us indicate informally how these extensions could be made. Note that we shall 
always have 

A‘ (nl ele! (nl, 


GI Zl 
and, in fact, the string A . (n) is just a rearrangement of the elements in the 


Bl 

string &@ gt: 

Consider |G] = 3 and recall h,(n) from Table 5.2. Since we want to construct 
A 3(n) in such a way that it is a rearrangement of the elements in & 3(n), recall 
that 

@ 3(n) = ha(ihg(n) 1) 2+ hg(n). 

In this case the first (pointer) component of @ 3(n) has only about log,(Ih,(n) 1) 
. elements, whereas the second (list) component has |h,(n)| elements. So we clearly 
cannot just use every other cell for the first component, as we did with A 3(n). 
Referring again to Table 5.2, we see that our pointer component has a 0 when the 


list component has length 1, has a 1 when the list component has length 2, has 00 


when the list component has length 3, etc. So 
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ie .,| 
Ih,(n)l = 2 2° '@ (i), 


i= t 
where 6, denotes © = When |h,(n)| = 1 then A 3(n) is of the form 0_, when 


Ih(n)| = 2 then A 3(n) is of the form L, etc. This scheme is illustrated in Figure 
6.1. The string @ ° is written out in blocks of size 2' and the coefficient of each 
block, 0 or 1, tells whether there are 2! or 2+ 2' elements, respectively, in that block. 


Rather than attempt to say more in words, we refer the reader to lable 6.2. 


form of A 1(n) 
0.2 


= ~ 
RS ewan e wren [® 
ee 
oO 
bo 


Figure 6.1. Outline of scheme for A 3(n). 


It is also possible to construct A‘ for i >1. The procedure is outlined in 


Bl 
Figure 6.2. Notice that we write the at part of @., as much as possible, in 
k 
blocks of size 1, 2, 27, 2°, 24, etc. Of course, when lO | = 22’ for some k (ie, 
i=0 
lO | = 1,3,7,15, etc.), then some digits in @, will be left over. In particular, let 
jt 
r= min{j| 22! > 10,)} 
1-0 


T 

Then we can write the first 22! elements in blocks of size powers of 2, each block 
iz 

preceded by a 0; a 1 indicates when we do not want to continue reading the next 


|a 


WANN &S& WM FE @ 


#4192 


8. AG 

: 

0 002 

1 012 
00 1002 
10 1102 
01 1012 
ll 111 
000 000002 
100 010002 
010 000102 
110 010102 
001 000012 
101 020012 
01 000112 
ibe G10112 
0000 1000002 
1000 1100002 
0100 1010002 
1100 1110002 
0010 1000102 
1010 1100102 
0110 1010102 
1110 1110102 
0001 1000012 
1001 1100012 
0101 1010012 
1101 1110012 
0011 1000112 
1011 | 1100112 
0111 1010112 
11u 1110112 
00000 00100009 
10000 01100002 
01000 00110002 


A 3(n) 


1 
0010 
0110 
00110 
01110 
OO1LL 
OLE 
00000100 
01090100 
00010100 
01010160 
00001100 
01001100 
00011100 
01011160 
000001100 
010601100 
000101100 
010101100 
010011100 
000111100 
010111100 
000901110 
010091110 
000101110 
010101110 
000011110 
010011110 
000111110 
010111110 
0000010100 
0100010100 
(001010100 


Table 6.2. Construction of pointer representations A ; and A e 


AHoilsinsesiqel Bl 2Q48 oAT 2.0 


succesively larger block. We must then represent the remaining IO 2! elements 
- $20 


tO" With wenawed Bikiceslenrmedo s peri Ee a0 
or | to indicate its presence or absence. Table 6.2 presents sample valeni tt deza(eeh. 


itl tO} = & {bpd} > % bed ow stedw ,0L3 siqmexd Masood 41.3 elqinexd 


e,! bsniteb ai Ep hala, sad T me i bas 


1 O=i 
2 0 QH_ 
3 {= (Qf _100 
4 9 x ned 
5 bes 
: Seth Sr {:& riniog oni ban 
8 O= (0.0 _ 11 00 
9 i = (@% 0 101 0 
10 S= (O80 2 “is 0 
ll is (om 0  _ 100 Pak. 
“S © Gly nous.stazsiqey wimieq grivisesig-noiisnaiganes 92 saiisb aso aw nsd'T 


yd 
Figure 6.2. Outline of schemeor A 2(n). 
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6.5 The BOS Pointer Representation 


For the sake of completeness, let us briefly mention the bottom of stack pointer 


representation. 


Example 6.14. Recall Example 6.10, where we had X = {a,b,c,d}, @ = {0,1,2,3}, 
3 
and ID = UX The function f:X > G* is defined by 
i=O 
f(a) = 
f(b) = 
f(c) = 
f(d) =3 
and the pointer @:J > gt is defined by. 
&(0) =0 
a1) =1 
(2) =2 
(3) = 
Then we can define the concatenation-preserving pointer representation g:ID 7 gt 
by 
la 
ald) = Ulead) Jrgieiey U (2d So 
{=1 


For instance, 


pla) =0 
pla) = 13 
A(cab) = 3102 


Assuming L is large enough to represent any d € ID (ie, L > 4), let us construct 
algorithms to implement the stack operations. In order to perform a POP operation 
we need not only decrement the pointer but the contents of all of the memory cells 
will have to be shifted left by one. 


Qoop? if m(0) =0 then return "Error" 


m({0) « m{(0) -1 

ie m(0) +1 

whilei >1do = m(i-1) © m(i) 
ici-l 


Similarly, the PUSHx operation requires that the contents of each cell be shifted 
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right by one. 


O if m(0) = 3 then return "Error" 


m(0) © m(0) +1 
templ © m(0) 
temp2 © m(1) 
m(1) © f(x) 


PUSHx’ 


Peg 
while i < templ then temp2 s m(1) 
ier 
The TOP operation is much easier. 
Cast if m{0) = 0 then return “Error” 


return m(1) 


We can extend the pointer scheme in the previous example and define the BOS 


pointer representation in the obvious way. 


co 
Definition. Let ID = UX', for k € Nand consider a function f:X > BT. Let 
i=0 
p:iD > B* be any pointer representation 
lal 
p(d) = Ufr(ati))}, U {eldl)}, 
i=4 ld|+1-4 


where n,n,€N (for any 0<i<k) and where @ is a representation 


0:7 = 37. Then pis a bottom of stack (BOS) pointer representation. 


The types of arguments used in the preceding sections can be used to 
determine the access costs for implementing the BOS pointer representation. For 
PUSHx or POP, the elements in the stack will all have to be moved, requiring an 
access to each n, field and also reading the entire pointer component (assuming the 
pointer achieves Kraft storage). A TOP operation is, however, cheap since it is 


always located in the same field, assuming, of course, that there is a TOP element. 
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k 


Theorem 6.12. Let ID = Ux! for k € IN and let the function f:X¥ 7 @tbe a 
i=0 
representation that achieves Kraft storage. Consider the BOS pointer 


representation g:ID > 8" defined by 
lal 


ad) = Uft(ali))}, U {e(idl)},, 
i=1 la|+1-4 
where n, nj € IN and where @ is any representation 0:J > gt, Then any 
implementation of the stack operations will have the following access costs: 
HArop( pld))] > 1 
MOpysyy,( P(d))] > leCdI)t + Idi + 1 


le(ldi)l+ld| if Id] #0 
HLQpop( pld))1 2 - 


IV 


1 if |ld| = 0 


We do not formally prove this theorem because the proof is similar to 
arguments we have already made and because we can now already see that the 
stack operations would have higher access costs than we would in general want. 

Note that the four stack representations we have discussed may all achieve 
Kraft storage, but their access costs differ greatly. We summarize in Table 6.3 some 


of the lower bounds we have determined in this chapter. 
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CHAPTER 7 
QUEUES 


The same framework that we have developed in this thesis can also be used to 
analyze queues. Although we shall not in this chapter prove any results, let us 
point out some of the complexities inherent in queues that are not present in stacks. 

Recall that a queue differs from a stack in that items are inserted at one end 
and deleted from the other. If we want to achieve Kraft storage in a representation 
of a queue, we know that we can use only a single pointer. This, however, does 
not allow multiple representations and so updating operations will necessarily have 
high access costs. In all of the examples we consider in this chapter, we shall 
assume a problem domain alphabet X = {a,b,c,d} and assume that |Z] is large 


enough so that a pointer always fits in a single cell in the cases we consider. We 


shall also assume that a € X is represented by 0 € ZG, b by 1, c by 2, and d by 3. 


Example 7.1. Suppose we have a three element queue. Consider implementing 
such a queue with a single pointer and holding the other end fixed. 
a) Let the rear of the queue be fixed; ie., all nections are made to the same cell. 
Thus, the entire contents of the queue must be slid each time an insertion is made. 
On the other hand, we need only decrement the pointer to delete an item from the 
queue. For instance, suppose our queue initially has three elements inserted: b, a, 
c. So m(0) = 2, m(1) =0, and m(2) = 1: 

2h]: -. Pointer to front: 3 
If we DELETE an item we are left with 

20 Pointer to front: 2 

If we now INSERT(d), we obtain 

7. 0 Pointer to front: 3 
Notice that each of the elements already on the queue had to be moved when we 


made an INSERT. With this scheme, a DELETE operation requires only a single 


a 


access in order to decrement the pointer. An INSERT operation, however, requires 
ld| + 1 accesses, where ld] is the initial queue size. 
b) If the front of the queue were stationary, then it would be an insertion which 
would be easy to perform. As above, suppose we have intially inserted b, a, c on 
the queue: 
LO? Pointer to rear: 3 
A DELETE operation requires moving the contents of each element in the queue: 
2 Pointer to rear: 2 
Now an INSERT(d) is simple: 
O-2°3 Pointer to rear: 3 
Using this second scheme, an INSERT operation requires two accesses, ore to the 
pointer and one to insert the new element. On the other hand, the DELETE 


operation requires accessing every element in the queue (as well as the pointer), 


ld| + 1 accesses. i 


The tradeoff in the preceding example suggests that we do not want to 
consider separately the access costs for the INSERT and DELETE operations; 
instead, we might want to consider the cost of a DELETE-INSERT pair of 
operations. In Example 7.la we found that an INSERT had cost Idl+1 and a 
DELETE had cost 1, a total cost of Id| + 2 accesses for the DELETE-INSERT pair. 
In Example 7.1b, INSERT had cost 2 and a DELETE had cost ld] + 1, a total cost of 
ld| + 3. 

The expense involved in the INSERT or DELETE operation in Example 7.1 
was due to the fact that we were forced to always maintain one end of the queue 
fixed. Of course, if we were to allow two pointers, then we would not have this 
problem. Instead, let us consider a scheme where we allow a queue to have one end 


in one of, say, two positions. 
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Example 7.2. Reconsider Example 7.1b but assume that the pointer is large enough 
that one bit can be reserved to indicate whether the “fixed" end of the queue is in 
cell 0 or in cell 1. Suppose our initial queue state is, as before: 

LD Pointer to rear: 3 Front: 0 
Now if we do a DELETE, we do not need to move ary of the list elements: 


02 Pointer to rear: 3 Front: 1 


An INSERT(d) operation gives: 

023 Pointer to rear: 4 Front: 1 
Unfortunately, another DELETE will require moving the queue: 

oo = _ Pointer to rear: 2 Front: 0 
Finally one more INSERT(a): 

230_ Pointer to rear: 3 Front: 0 
This effectively brings us back to our initial state (although the actual queue 
elements differ). Notice that these four operations we performed required, in 


order, 1, 2, ld] + 2, and 2 accesses, where ld| refers to the size of our initial queue 


state before the two pairs of DELETE-INSERT operations were performed. I 


So in Example 7.2, by reserving one bit of the pointer to indicate the location of the 
front of the queue, we used a total of ld] + 7 accesses, only fale accesses on the 
average for a DELETE-INSERT. On the other hand, without using this extra bit 
we in Example 7.1 were forced to make Id] + 2 accesses for a DELETE-INSERT. So 
we were able to not only delay the heavy cost of sliding the queue, but we in fact 
have decreased the average cost of a DELETE-INSERT pair. Let us use the same 


trick again and reserve two bits to tell us where the front of the queue is located ; 


ie., the front of the queue may be in any of cells 0, 1, 2, 3. 


Example 7.3, Given an initial queue 210 3, let us perform a sequence of four 
DELETE-INSERT pairs of operations, keeping track of the numbers of accesses. 
2103 ___ 
DELETE: £032 1 
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INSERT (a): Rie Ue eae 2 

DELETE: ee eck ee 1 

INSERT (c): ~-93502_ 2 

DELETE: BOD 1 

INSERT(b): == 3021 y 

DELETE: lee Eos ld] + 4 

INSERT (a): eee Ea 2 
This gives a total of ld| +15 accesses, an average of wales accesses per 
DELETE-INSERT pair. I 


In general, if we reserve k bits of the pointer to indicate the location of one end of 
the queue, then there are on possible representations of each queue, and a 


DELETE-INSERT pair requires, on the average, 
Idi + a + 3.(2% - 1) +2 
we te a as 


ld| 1 kel 
= 4 - =O 
9 gk Ok Cr 


accesses. 

Thus, we have seen that a one pointer scheme allows no multiple 
representations and we may achieve Kraft storage. Using a two pointer scheme, the 
queue could be located anywhere in memory (within the range of the pointers) and 
may, in fact, drift throughout memory. An intermediate scheme has a single 
pointer which has enough room for |d| with one or more extra bits reserved to 
indicate the location of one end of the queue. In this latter case, we not only defer 
but actually save in our access cost. ‘This illustrates not only a storage-access 
tradeoff but also a tradeoff with multiplicity of representation, and we have a nice 
continuum between the one and two pointer cases. 

Suppose we do want to achieve Kraft storage and are using a single pointer. It 
is interesting to consider how many accesses are required in order to perform a 
DELETE-INSERT pair of operations. If the queue is always of a fixed size k (ie., 
the only operations performed are DELETE-INSERT (a) pairs), then, somewhat 


surprisingly, it is possible to represent the queues in memory in such a way that the 
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average number of cells accessed is a constant independent of the length Idl. On 
the other hand, suppose we insist that the representation function p have the 
constraint that p(d) is a permutation of d and that d(i) always maps to the same 
memory cell(s), for all O<i<ld. Then it can, in fact, be shown that a 
DELETE-INSERT pair of operations performed on all queues of a fixed length k 
will have an average access cost of at least (By +k. Thus, for most natural 


encoding schemes it will be necessary to access essentially ld| cells. 
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CHAPTER 8 
CONCLUSIONS 


In this thesis we have explored what it means for a list to be 
information-theoretically optimal, in the sense that it achieves Kraft storage and 
Kraft access. We first examined the full set of table lookup questions and showed 


that if we are considering problem domains of the form ID = U x!, then it is 
1€J 


possible to achieve both bounds simultaneously only for domains ID = X” and 
ID = {A} UX™ This corresponds to a notion of independence; essentially, it must 
be the case that tio matter what the value d(i) € X, then d(it1) might take on any 
value in X. If we were to determine d(i) = @ then it would have to be the case 


that d(itl) = @ and we would not have independence. Of course, we did see that 


there is a perhaps surprising exception, namely, when ID = {A} U X™ and Il = 2. 

As a consequence of this work, we were able to show that it is never possible 
to achieve both Kraft storage and Kraft access for many common list representation 
schemes. ‘The only exception was for a fixed size representation, when ID = X”. 
Since we are here primarily interested in variable-length lists, it is clear that we will 
not be able achieve both. 

We discussed four natural stack representation schemes: TOS endmarner, BOS 
endmarker, TOS pointer, and BOS pointer, We were able to obtain fairly tight 
lower bounds on access costs for performing POP, PUSHx, and TOP operations; 
those results are summarized in Table 6.3. It is shown that endmarker 
representations are necessarily expensive to update. On the constructive side, we 
developed a representation scheme for a TOS pointer that is storage optimal and 
does quite well for access. Assuming a monotomically noruncreasing probability 


distribution on stack lengths, we were able to obtain the following access costs: 
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HMOsopl p(d)) I < 2+ Llog,(Idl+1) J + ky = OClogldl) 
HOpyeuy( Ad) < 2 Llog (ld) J +k, = O(log( lal) 
avg #HApgpl p(d))] < ko, 
for k,, kz, kg € IN. The bounds we obtained give an indication as to why pointer 
representations are so commonly used in the practical implementation of stacks. 

In the discussion of stacks, we were forced to examine separately several 
classes of representations. It would be nice if there were some more general 
characterization that would allow us to make more general statements. For irstance, 
is it possible for any implementation to perform both a PUSHx and a POP in a 
constant number of accesses. 

The model that we used is capable of more generalization. For instance, 
instead of considering access costs for performing only a single operation, we might 
wish to perform a sequence of operations. Also, our definition of access or storage 
costs could be altered to correspond to the desired application; we might even be 
able to consider same sort of hierarchical memory structure. 

There remains a great deal of work to be done. Pertiaps the most obvious is 
the need to apply the techniques used in this thesis in order to examine other types 
of lists. We briefly discussed queues, but it is clear that queues raise a lot of issues 
that were not present with stacks. ‘he flavor of some preliminary results was 
indicated in that chapter. It appears that dequeues are a straightforward extension 
of queues, but there remain many other types of lists to be explored. Jn addition, 
it would be interesting to know whether similar arguments could be applied to trees. 
Some of the techniques discussed may also be useful in the analysis of hashing 


tables. 
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